Datastreamer migrates data pipeline from on-prem to Google Cloud and reduces costs with SADA

Just as the oil industry utilizes pipelines to transport large quantities of raw materials to refineries for processing, Datastreamer provides companies who are building data-intensive apps with a turnkey data pipeline to connect and analyze disparate sources of streaming raw data.

In December 2020, Datastreamer’s founders recognized an unmet need to get actionable insights from unbounded data in an organized, timely approach. The company set out to replace expensive and complex do-it-yourself pipelines and make it easy for companies to enrich and integrate unstructured streaming data into their intelligent products–with a single API.

“Our primary focus is unbounded data streams that essentially don’t have any limits,” said Tyler Logtenberg, Vice President of Operations at Datastreamer. “For example, there is no point at which you can say you have all the news. It just keeps coming. To serve customers needing this type of information, we digest 3 to 4 million news articles a day from all around the world and run a gamut of different machine learning models on them.” Other examples of unbounded data streams include financial trading data, web traffic, weather updates, and dark web data.

Organizations spanning a wide variety of industries including marketing, know your customer/anti-money laundering (KYC/AML), research, cyber intelligence, finance, and healthcare partner with Datastreamer to leverage its infinite-data capabilities, scalability, security, and adaptability. “With this unique platform, customers decrease their time to market and reduce data integration costs without the need for additional infrastructure, development, or maintenance,” said Logtenberg.

 

Business challenge

When conducting a search on Datastreamer’s system, end users are sifting through a petabyte of text data alone. In the beginning, the company’s IT infrastructure was hosted and managed on-premises. As Datastreamer rapidly matured and expanded its offerings, it started to run into logistical bottlenecks.

“We needed to scale for the future and migrate to a stable cloud infrastructure that could handle the speed and volumes of our operations, yet manage costs,” said Logtenberg. “Every second we are running over 10,000 enrichments from six different AI models in real time.”

Datastreamer needed a more reliable infrastructure to be able to concentrate on building out their solution. “When it comes to platform scalability and stability issues, that needs to be someone else’s problem,” said Logtenberg. “We must be 100 percent focused on unbounded data streams.”

 

Solution

After reviewing options from multiple cloud providers, Datastreamer’s engineering team determined that the best approach to move forward was to build the next version of its platform exclusively on Google Cloud.

“There were a few reasons we chose to go all in on Google Cloud. First, Compute servers and Google Kubernetes Engine are far more advanced than the competition and accommodate our heavy use of Dataflow to run our models,” said Logtenberg.

“Also, Google Cloud’s own products specialize in high volumes of searchable text data, which parallel a lot of the technologies that we’ve built in-house, so there is strong strategic alignment.”

At the recommendation of Google Cloud, Datastreamer engaged with SADA, a three-time Google Cloud Reseller Partner of the Year, to assist with cost saving measures and technical guidance.

Datastreamer appreciates SADA’s ability to get to know the operations of its business in granular detail and provide expertise on a wide array of solutions. “It’s been great to go to SADA and say, ‘We’re doing a proof of concept for this new technology and we have three questions,’ and then get quick answers,” said Logtenberg. “The research support and advice we receive from SADA is invaluable and keeps us from going down a rabbit hole.”

 

Results

As a result of working with SADA and Google Cloud, Datastreamer was able to successfully migrate to Google Cloud. “Compared to our previous on-premises situation, we reduced our cloud costs by 30 percent,” said Logtenberg.

After SADA initiated Datastreamer’s Committed Use Discounts (CUDs) to optimize its overall Google Cloud spend, it provided the company a more consistent platform to build on. “Now, we are able to deliver more features, and because of the Committed Use Discounts that SADA showed us, our costs are more predictable and less expensive–a win-win in my book,” said Logtenberg.

SADA has also been instrumental in launching capabilities for Datastreamer customers to securely bring in and utilize their own data in the pipeline. Overall, SADA helped Datastreamer:

 

  • Support customized database technology on Google Cloud
  • Reduce cloud costs by 30 percent
  • Scale infrastructure to prepare for rapid growth
  • Improve stability and performance of Datastreamer services
  • Develop new capabilities and features

 

Datastreamer now processes 10,000 enrichments per second, 56,000 pieces of new content per second, and 1.6 million data points per second.

“We’re always pushing the limits on Google Cloud and keen to adopt its new technologies,” said Logtenberg. “SADA is invested in us finding those solutions that will enable our business and place us at a competitive advantage. They provide sage advice, bring in appropriate resources, and really go to bat for us.”