Measuring the value of data quality

Data intelligence, integrity, and integration are aspects of every CIO strategy in some fashion. It’s not possible to achieve data insights without first cleaning up your bad data. To do that, you must define the value of that data.

The effect that bad data has on our organization is personal. It’s time we put a value on data and, more specifically, a value on bad data. Let’s fix this together.

Wikibon, a community of open-source, advisory-sharing practitioners, estimates that the worldwide data market will grow at an amazing 11.4% CAGR between 2020 and 2027, reaching $103 billion by 2027. That’s the upside. The downside is that IBM estimates that bad data currently costs the US $3 trillion.

One trillion, three trillion, 10 trillion—that sounds like a lot. But do you understand the impact of $3 trillion? Likely no. However, what you do understand is the personal impact that bad data issues have on your family. Yes, your family.

It’s 7 pm, and you’d told yourself you’d wrap up early for a change, but then that email comes in. It’s from your business partner, who just got around to opening the executive status report or those financials she finally had time to read through. Either way, a quick look at the data, and you already know it’s bad and can’t be real. So much for getting out early.

The real cost of bad data isn’t the cost to reacquire a data set that’s been canned or the cost to implement a new system to replace the old. Every executive knows the real cost is personal: time away from friends and family. This is true both for that person, who’s buffering a wave of irate executives, and for their team, which is triaging the root cause and adding weight to an already lopsided work-life balance.

The business impact of poor data

On the topic of data, the first thing that comes to mind is the dimensions of data. These are concepts that include format, authority, credibility, relevance, readability, usefulness, reputation, involvement, accessibility, completeness, consistency, timeliness, uniqueness, accuracy, validity, lineage, currency, precision, accessibility, or representation. These are valid and, at times, very important. They are not, however, the most important consideration. We need to frame the problem of bad data from the lens of our business partners.

Data should be listed as an asset or liability on the company balance sheet. We don’t consider it a hard asset, and we don’t depreciate purchased data. Until that happens, we need to look at the utility of the data and the value we expect to derive from it.

Let’s start by looking at the categories where poor data would inhibit business success. By viewing data errors within a classifying scheme, we begin to quantify the value of bad data. A simple approach is to use four general categories:

  1. Financial
  2. Confidence
  3. Productivity
  4. Risk

Financial primarily focuses on missed opportunities. This could be in the form of decreased revenues, reductions in cash flow, penalties, fines, or increased operating costs.

Confidence and satisfaction have an internal and external impact. Internal impacts on confidence would include employee engagement, decreased organizational trust, low forecasting confidence, or inconsistent operational or management reporting. External confidence addresses how customers or suppliers feel about delays or impacts to their business. Ultimately, bad data leads to bad or incorrect decisions.

Productivity impacts throughput. Therefore, increased cycle time causes drops in product quality, limited throughput, or increased workloads. These are all aspects that involve outcomes.

Risk and compliance center on leakage. This leakage could be value leakage, investment leakage, competitive leakage, or, more seriously, data that directly affects a regulation, resulting in downstream financial and business impacts.

This framework helps to delineate between big-data issues that affect either “running the business” or “improving the business” and those issues that are inconvenient but largely insignificant.

Categorizing the impact

Before we tackle the business expectations for bad data and how poor data impacts our business, we need to create some subcategories to continue our assessment.

Using the four core categories above, let’s deconstruct these into more specific areas of focus:

Financial

  • Direct operating expense
  • Resource overhead
  • Fees
  • Revenue
  • Systems of production
  • Delivery and transport
  • Supplier services
  • Cost of goods sold
  • Demand management
  • Depreciation
  • Leakage
  • Capitalization

Confidence

  • Forecasting
  • Reporting
  • Decision-making
  • Satisfaction (internal and external)
  • Customer interaction
  • Supplier or collaborator trust

Productivity

  • Workloads
  • Throughput
  • Output quality
  • Staffing optimization
  • Asset optimization
  • Service-level alignment
  • Efficiency
  • Defects
  • Downstream

Risk

  • Financial
  • Legal
  • Market
  • Systems
  • Operational downtime
  • Reputation loss
  • Testing
  • Vulnerability remediation
  • Regulatory, compliance and audit
  • Security

Direct operating expenses involve direct labor and materials. Resource overhead accounts for additional staff to run the business such as recruiting or training personnel. Fees are penalties for mergers and acquisitions or other service charges. Revenue pulls in any missed customer opportunities like customer retention. The cost of goods sold is the standard product design and cost of inventory, etc. Depreciation covers inventory markdowns or decreases in property value. Leakage is mainly financial and involves fraud and collections; however, this can be extended to include value leakage such as the impact of not gaining the efficiency from a system that targets saving 250 team members 20% of their day through automation. Capitalization quantifies the value of equity.

Forecasting is the impact of bad decisions based on financial data such as actual vs. budget or the resource management cost from not proactively staffing. Decision-making is usually event-driven and links the time to make a decision and the quality of the decision. Satisfaction is largely the internal service relationship between providers and consumers; e.g., shadow IT, outsourcing, etc. Supplier or collaborator trust measure the optimized procurement process and the vendor confidence in the provider.

Workloads target an increase in work over a baseline. Throughput measures the volume of outputs, typically in cycles; for example, the time required to analyze a process or time taken to prepare data for input into a process. Output quality is about trust in published information; for example, is the data in the status report trusted by the leaders that receive it, or is that report dismissed due to mistrust? Efficiency looks at avoiding waste in people, processes, or technology. Defects highlights imperfections from the norm; this could be a process, system, or product. Downstream evaluates the next process in the chain and the delays experienced because of upstream data-quality issues.

Financial risk targets the bottom line in either hard or soft losses. Legal risk removes protections or increases exposure. The market could involve competitiveness or the loss of customer goodwill. System risk covers delays in deployments. Testing risk is the loss of functionality due to non-working components being released into production or released with defects. Regulatory and compliance risks often deal with reporting or, more importantly, the data and data quality that’s being officially reported. Security risk—a growing concern—addresses data that impacts internal customers (employees) or external customers (suppliers).

These aspects all align poor data quality with negative impacts on the business. Typically, organizations will log quality issues in a tracking system. This is useful to help quantify the impact or play back the value of the data management organization or the office of the chief data officer.

Ask questions

What makes poor-quality data a critical business problem? If the problems are just inconvenient, action doesn’t need to be  taken. Use these questions to elicit ideas to quantify the impact of bad data in your organization:

Financial

  • Which transformation efforts are on hold; e.g., data lake, analytics?
  • How much time is consumed by cleaning portfolio or financial data?
  • What decisions aren’t being made due to lack of vendor management benchmarking?

Confidence

  • How efficient is the vendor-onboarding cycle?
  • Is there organizational confidence in company-specific data; e.g., patients, genes, products, etc.
  • How easily can suppliers’ or collaborators’ data be validated?

Productivity

  • Where can RPA be used to streamline data processing?
  • How many processes are repetitive and require minimal human intervention?
  • How do we remove waste from our process? What was good yesterday might not be good today.

Risk

  • Do events have lineage? If not, what’s the cost to compile that view on request per event?
  • What’s the cost of a single regulatory misfiling or violation?
  • How can an incorrect risk assessment put the company at risk?

Limitless categories and subcategories can quickly become all-encompassing. Start simple. Use the idea of an N-of-1 or one event. We might be talking about the history of a cell line in biotechnology or the history of changes to portfolio financials. For example, instead of estimating the cost of researching data lineage generally, consider one specific example. Collapse the steps into five core phases and put effort into each. Then identify the people involved at each step. From there, add in a blended rate and multiply the effort of each step by the blended rate. The product is the net cost for an N-of-1 event—the cost of a single event. This methodology is very powerful when addressing bad-data problems.

The flip side

We understand the risk of poor data quality. It’s important to quantify the organizational financial impact of not correcting bad data. Putting in place standard data definitions or organizational data dictionaries can build consistent terminology. Implementing guidance on how data should be reported can help explain permitted values and data that’s inaccurate.

Similar to any other discipline, this process requires training and education. Take time to invest in your department and organization. Your team will thank you for it.

Previous articleQuantifying the value of your enterprise data science initiative
Next articleHow CIOs lead to achieve game-changing outcomes
Peter is a technology executive with over 20 years of experience, dedicated to driving innovation, digital transformation, leadership, and data in business. He helps organizations connect strategy to execution to maximize company performance. He has been recognized for Digital Innovation by CIO 100, MIT Sloan, Computerworld, and the Project Management Institute. As Managing Director at OROCA Innovations, Peter leads the CXO advisory services practice, driving digital strategies. Peter was honored as an MIT Sloan CIO Leadership Award Finalist in 2015 and is a regular contributor to CIO.com on innovation. Peter has led businesses through complex changes, including the adoption of data-first approaches for portfolio management, lean six sigma for operational excellence, departmental transformations, process improvements, maximizing team performance, designing new IT operating models, digitizing platforms, leading large-scale mission-critical technology deployments, product management, agile methodologies, and building high-performance teams. As Chief Information Officer, Peter was responsible for Connecticut’s Health Insurance Exchange’s (HIX) industry-leading digital platform transforming consumerism and retail-oriented services for the health insurance industry. Peter championed the Connecticut marketplace digital implementation with a transformational cloud-based SaaS platform and mobile application recognized as a 2014 PMI Project of the Year Award finalist, CIO 100, and awards for best digital services, API, and platform. He also received a lifetime achievement award for leadership and digital transformation, honored as a 2016 Computerworld Premier 100 IT Leader. Peter is the author of Learning Intelligence: Expand Thinking. Absorb Alternative. Unlock Possibilities (2017), which Marshall Goldsmith, author of the New York Times No. 1 bestseller Triggers, calls "a must-read for any leader wanting to compete in the innovation-powered landscape of today." Peter also authored The Power of Blockchain for Healthcare: How Blockchain Will Ignite The Future of Healthcare (2017), the first book to explore the vast opportunities for blockchain to transform the patient experience. Peter has a B.S. in C.I.S from Bentley University and an MBA from Quinnipiac University, where he graduated Summa Cum Laude. He earned his PMP® in 2001 and is a certified Six Sigma Master Black Belt, Masters in Business Relationship Management (MBRM) and Certified Scrum Master. As a Commercial Rated Aviation Pilot and Master Scuba Diver, Peter understands first hand, how to anticipate change and lead boldly.