The play for in-storage data processing to accelerate data analytics

New solutions that leverage hyperscalers extend data analytics capabilities by optimizing the management of multiple data streams.

Peter Nichol

September 21, 2021

Is there a business case for in-storage data processing? Of course, there is, and I’m going to explain why.

Hi, I’m Peter Nichol, Data Science CIO.

Computational storage is one of those terms that’s taken off in recent years that few truly understand.

The intent of computational storage

Computational storage is all about hardware-accelerated processing and programmable computational storage. The general concept is to more data and computers closer together. The idea is that when your data is far away from your compute, it not only takes longer to process, but it’s more expensive. This scenario is common in multi-cloud environments where moving and erasing data out is a requirement, but that requirement comes at a very high cost. So the closer we can move that data to our compute power, the cheaper it will be, and ultimately, the faster we will be able to execute calculations.

Business cases for computational storage

The easier way to understand computation storage is to observe a few examples. These concepts are primarily embedded in startups and are most commonly known as “in-situ processing” or “computational storage.”

First, let’s focus on an example around hyperscalers. Hyperscale is used to do things like AI compute, high throughput video processing, and even composable networking. When we observe organizations like Microsoft, they are incorporating these technologies into their product suites. For example, Microsoft is using computation storage in their search engines with the application of use field-programmable gate arrays (FPGAs). The accelerated hardware enables search engines and can provide those credentials and analytical results in less than microseconds. Also, these capabilities are expanding into other capabilities like Hadoop MapReduce using DataNodes for storage and processing.

Second, architectures that are highly distributed are very effective. Hyperscale architectures build a great foundation to scale computational compute capabilities. The concept of segmenting hardware to software is not new. Even AWS Lambda—typically a data streaming capability—we can deconstruct an application to break out data flow into several parts. This makes managing multiple data streams much more streamlined. For example, data feeds can be individually ingested into a data stream, then AWS Lambda can manage the data funnel from AWS Lambda into computational storage. Once that stream is fed into computational storage, that data stream is more efficient and capable of executing instruction even faster than if not fed into computational storage.

Why look at computational storage now?

Do we as leaders even care about computational compute? Yes, we do. Here’s why.

Looking over the last year or even the previous decade, the way data is stored is designed and architected based on how CPUs are designed to process that data. That was great when the hardware designs aligned to the way data was processed. But, unfortunately, how CPUs were designed and architected over the last ten years has dramatically changed. And as a result, we need to change how we store data to process it more effectively, faster, and cheaper.

The industry trend to adopt computational storage

Snowflake is an excellent example of a product that separates the compute from the storage processing. This results in a perfect opportunity for business leaders to realize the benefits of separating computing from storage. This helps accelerate data read and processing cycle times. The advantage is that users experience faster application and interface responses with faster visualization and presentment of the data requested.

If you’re curious to research additional topics around computational storage, the Storage Networking Industry Association (SNIA) formed a working group in 2018 that has the charge to define vendor-agnostic interoperability standards for computational storage.

As you think about your technology environment and how you’re leveraging and processing data, consider how far your data is from your ability to compute that data and process it analytically. Data needs are growing exponentially, and the demand for computational storage will be tightly coupled to the need to display and visualize organizational data. Your organization might benefit from levering computational storage to connect high-performance computing with traditional storage devices.

If you found this article helpful, that’s great! Also, check out my books, Think Lead Disrupt and Leading with Value. They were published in early 2021 and are available on Amazon and at http://www.datsciencecio.com/shop for author-signed copies!

Hi, I’m Peter Nichol, Data Science CIO. Have a great day!