The play for in-storage data processing to accelerate data analytics

New solutions that leverage hyperscalers extend data analytics capabilities by optimizing the management of multiple data streams.

Is there a business case for in-storage data processing? Of course, there is, and I’m going to explain why.

Hi, I’m Peter Nichol, Data Science CIO.

Computational storage is one of those terms that’s taken off in recent years that few truly understand.

The intent of computational storage

Computational storage is all about hardware-accelerated processing and programmable computational storage. The general concept is to more data and computers closer together. The idea is that when your data is far away from your compute, it not only takes longer to process, but it’s more expensive. This scenario is common in multi-cloud environments where moving and erasing data out is a requirement, but that requirement comes at a very high cost. So the closer we can move that data to our compute power, the cheaper it will be, and ultimately, the faster we will be able to execute calculations.

Business cases for computational storage

The easier way to understand computation storage is to observe a few examples. These concepts are primarily embedded in startups and are most commonly known as “in-situ processing” or “computational storage.”

First, let’s focus on an example around hyperscalers. Hyperscale is used to do things like AI compute, high throughput video processing, and even composable networking. When we observe organizations like Microsoft, they are incorporating these technologies into their product suites. For example, Microsoft is using computation storage in their search engines with the application of use field-programmable gate arrays (FPGAs). The accelerated hardware enables search engines and can provide those credentials and analytical results in less than microseconds. Also, these capabilities are expanding into other capabilities like Hadoop MapReduce using DataNodes for storage and processing.

Second, architectures that are highly distributed are very effective. Hyperscale architectures build a great foundation to scale computational compute capabilities. The concept of segmenting hardware to software is not new. Even AWS Lambda—typically a data streaming capability—we can deconstruct an application to break out data flow into several parts. This makes managing multiple data streams much more streamlined. For example, data feeds can be individually ingested into a data stream, then AWS Lambda can manage the data funnel from AWS Lambda into computational storage. Once that stream is fed into computational storage, that data stream is more efficient and capable of executing instruction even faster than if not fed into computational storage.

Why look at computational storage now?

Do we as leaders even care about computational compute? Yes, we do. Here’s why.

Looking over the last year or even the previous decade, the way data is stored is designed and architected based on how CPUs are designed to process that data.  That was great when the hardware designs aligned to the way data was processed. But, unfortunately, how CPUs were designed and architected over the last ten years has dramatically changed. And as a result, we need to change how we store data to process it more effectively, faster, and cheaper.

The industry trend to adopt computational storage

Snowflake is an excellent example of a product that separates the compute from the storage processing. This results in a perfect opportunity for business leaders to realize the benefits of separating computing from storage. This helps accelerate data read and processing cycle times. The advantage is that users experience faster application and interface responses with faster visualization and presentment of the data requested.

If you’re curious to research additional topics around computational storage, the Storage Networking Industry Association (SNIA) formed a working group in 2018 that has the charge to define vendor-agnostic interoperability standards for computational storage.

As you think about your technology environment and how you’re leveraging and processing data, consider how far your data is from your ability to compute that data and process it analytically. Data needs are growing exponentially, and the demand for computational storage will be tightly coupled to the need to display and visualize organizational data. Your organization might benefit from levering computational storage to connect high-performance computing with traditional storage devices.

If you found this article helpful, that’s great! Also, check out my books, Think Lead Disrupt and Leading with Value. They were published in early 2021 and are available on Amazon and at http://www.datsciencecio.com/shop for author-signed copies!

Hi, I’m Peter Nichol, Data Science CIO. Have a great day!

Previous articleThe factors quietly disrupting your business and what to do about it
Next articleThe new age of analytics fueled by hyper-converged storage platforms
Peter is a technology executive with over 20 years of experience, dedicated to driving innovation, digital transformation, leadership, and data in business. He helps organizations connect strategy to execution to maximize company performance. He has been recognized for Digital Innovation by CIO 100, MIT Sloan, Computerworld, and the Project Management Institute. As Managing Director at OROCA Innovations, Peter leads the CXO advisory services practice, driving digital strategies. Peter was honored as an MIT Sloan CIO Leadership Award Finalist in 2015 and is a regular contributor to CIO.com on innovation. Peter has led businesses through complex changes, including the adoption of data-first approaches for portfolio management, lean six sigma for operational excellence, departmental transformations, process improvements, maximizing team performance, designing new IT operating models, digitizing platforms, leading large-scale mission-critical technology deployments, product management, agile methodologies, and building high-performance teams. As Chief Information Officer, Peter was responsible for Connecticut’s Health Insurance Exchange’s (HIX) industry-leading digital platform transforming consumerism and retail-oriented services for the health insurance industry. Peter championed the Connecticut marketplace digital implementation with a transformational cloud-based SaaS platform and mobile application recognized as a 2014 PMI Project of the Year Award finalist, CIO 100, and awards for best digital services, API, and platform. He also received a lifetime achievement award for leadership and digital transformation, honored as a 2016 Computerworld Premier 100 IT Leader. Peter is the author of Learning Intelligence: Expand Thinking. Absorb Alternative. Unlock Possibilities (2017), which Marshall Goldsmith, author of the New York Times No. 1 bestseller Triggers, calls "a must-read for any leader wanting to compete in the innovation-powered landscape of today." Peter also authored The Power of Blockchain for Healthcare: How Blockchain Will Ignite The Future of Healthcare (2017), the first book to explore the vast opportunities for blockchain to transform the patient experience. Peter has a B.S. in C.I.S from Bentley University and an MBA from Quinnipiac University, where he graduated Summa Cum Laude. He earned his PMP® in 2001 and is a certified Six Sigma Master Black Belt, Masters in Business Relationship Management (MBRM) and Certified Scrum Master. As a Commercial Rated Aviation Pilot and Master Scuba Diver, Peter understands first hand, how to anticipate change and lead boldly.