Data integrity at the origin
Today, many organizations' answer to unlocking data for analytical usage is to build a labyrinth of data pipelines. Pipelines retrieve data from one or multiple sources, cleanse it and then transform and move it to another location for consumption. This approach to data management often leaves the consuming pipelines with the difficult task of verifying the inbound data's integrity and building complex logic to cleanse the data to meet its required level of quality. The fundamental problem is that the source of the data has no incentive and accountability for providing quality data to its consumers. For this reason, we strongly advocate for data integrity at the origin, by which we mean, any source that provides consumable data must describe its measures of data quality explicitly and guarantee those measures. The main reason behind this is that the originating systems and teams are most intimately familiar with their data and best positioned to fix it at the source. Data mesh architecture takes this one step further, comparing consumable data to a product, where data quality and its objectives are integral attributes of every shared data set.