Data integrity at the origin

This blip is not on the current edition of the Radar. If it was on one of the last few editions it is likely that it is still relevant. If the blip is older it might no longer be relevant and our assessment might be different today. Unfortunately, we simply don't have the bandwidth to continuously review blips from previous editions of the RadarUnderstand more
Published: Nov 20, 2019
Nov 2019

Today, many organizations' answer to unlocking data for analytical usage is to build a labyrinth of data pipelines. Pipelines retrieve data from one or multiple sources, cleanse it and then transform and move it to another location for consumption. This approach to data management often leaves the consuming pipelines with the difficult task of verifying the inbound data's integrity and building complex logic to cleanse the data to meet its required level of quality. The fundamental problem is that the source of the data has no incentive and accountability for providing quality data to its consumers. For this reason, we strongly advocate for data integrity at the origin, by which we mean, any source that provides consumable data must describe its measures of data quality explicitly and guarantee those measures. The main reason behind this is that the originating systems and teams are most intimately familiar with their data and best positioned to fix it at the source. Data mesh architecture takes this one step further, comparing consumable data to a product, where data quality and its objectives are integral attributes of every shared data set.