When it comes to healthcare, data analytics isn’t just a ‘nice to have’, it can dramatically improve patient outcomes. Back in 2014, a data infrastructure solution called health data interchanges (HIE) was accessed only 2.4% of the time but those patients who had their providers examine their previous health data were 30% less likely to end up in the hospital. What’s more, the healthcare network in question saw an annual savings of $357,000. The Future Value of that savings, in 2014, from 2014 to 2020, is a remarkable $2.8M.
But even with that financial ROI, fast forward to February, 2020, six years later — it still took the COVID-19 coronavirus to cause HIE services use to triple as providers now understood the importance of patient data sharing during COVID-19.
There are other examples of technical successes in data infrastructure projects, but it can be difficult to get people to use these data systems or build the infrastructure for the right business problem. Let’s say you are committed to data sharing, analytics and deriving benefits from your health data gold mine; what solution challenges can occur and how do you avoid them?
A division of one of our customers, one the largest, national, commercial health insurance companies, has a mission to reimagine health insurance as a digital business. The insights it can gain, though, from patient outcomes, clinic visits, treatment regimens, pharmaceutical courses and doses, and a myriad other clinical data sources, depends upon its data infrastructure. Unfortunately, like many other large businesses, it has big data headaches preventing it from achieving its vision. Recognize any?
Focusing on architecture in a top-down fashion, while listening to a wide variety of users, inevitably resulted in an architecture fit for no use cases in particular. Additionally, over time, different data analysis and application teams with different remits, but who were autonomous and isolated, often encountered the same business and technical issues. This unfortunately meant they continuously relearned how to overcome recurrent issues, with no reuse of these learnings over time. For example, the same arduous effort to understand, correlate and join data between systems was repeated over and over. For instance, another customer has 28,000+ data warehouse scripts supporting their platform. There is massive redundancy in these scripts because they are implemented in a modular fashion. They point to the same source tables, perform the same joins, and massively replicate the same business rules transformations against those tables. This "cut and paste" form of reusability makes adapting those business rules a massive undertaking, and that body of scripts represents significant technical debt.
We have all seen how typical data solutions have failed publicly in the press and silently ended inside our own companies, exhibiting promises and symptoms:
So on top of the technical challenges, there are two broader issues; users sometimes don’t know a system exists or don’t use it enough, or, IT has spent too much time trying to make it useful to everyone that it becomes useful for no one. Is it then possible to connect these two groups - users and IT - given all the challenges, and build an appropriate system in a timely manner? Is it possible to build a system serving multiple, parallel uses and users without ‘cut and paste’ reuse? Is there truly a solution?
We call it the Data Mesh.
The ROI of continuous data analytics and its benefits
We have our success example above of reducing patient hospital entry by 30% using data engineering to link and analyze critical, historical, clinical information. While laudable, our Data Mesh approach has even more general benefits to enjoy. Our Data Mesh approach applies domain thinking that preserves the business meaning of data and applies platform thinking to speed up delivery and serve data securely. The result is that our returns on investment from Data Mesh driven projects have broken delivery time-to-market records inside our customers.
A vital output of our Data Mesh approach is called a ‘Data Product’. It serves a business community for one or more business use cases. The Data Product is essentially a topic or domain-specific data set that is continuously and automatically updated. It is built very quickly (days to weeks). The Data Product is then typically accessed by an analytical application - business intelligence, machine learning and statistical modeling applications. Tools such as R, Tensorflow, PyTorch, Azure PowerBI or Tableau connect natively to the data product as connection types can be automatically generated by the Data Mesh tooling. An important metric to measure the Data Mesh’s time to market improvement is:
Metric: # of data products deployed live to customer group(s) per a time period (i.e. the “time to market” to create a business relevant Data Mesh data set).
This metric must be calculated with care to ensure that the Data Mesh technology does not lead to massive proliferation of data products, only includes data products that produce analytical ROI or business impact, only includes data products with non-zero business users, and only includes data products that we delivered timely in the minds of those users.
The aforementioned healthcare company is nearly two years into their data mesh adoption. Early during the COVID-19 pandemic the company tasked a data product development team to develop a set of COVID-care ‘data products’ for its members as quickly as possible. In this case they created eight data products within three weeks. On a separate project, another data product team built 50 data products that went live within two months; these data products were accessed by 4,800 business users running PowerBI on top to improve healthcare. Overall, we have multiple, parallel data teams delivering with the Data Mesh approach. Previously, the architecture-driven, single thread data teams had been unable to produce a single useful data set in over a year.
Data Mesh-oriented cloud development is much faster than standard data lake and warehouse solution development. By our reckoning, you can reduce the time it takes to get valuable insights from quarters and months to weeks.
The Data Mesh approach also provided our client with many operational improvements:
Lean process improvements
Lean operational improvements
The Data Mesh is a next-generation data engineering approach and platform, specifically highlighting that data domains (e.g. business data or business objects) are the first concern one should define and discover for quickly delivering a data system for analytics. A data domain example could be the claims data product holding healthcare patient claim data.
The next important concept is that data should be treated as a product — produced and owned by independent cross-functional product teams who have embedded data engineers. The advantages of this are that it encourages reuse across the organization and that the team shepherds its maturity over time. Data products should be distributed across the enterprise — instead of having one big data product, otherwise known as a data lake.