Enable javascript in your browser for better experience. Need to know to enable it? Go here.
Last updated : Apr 03, 2024
Apr 2024
Trial ?

When you build data products using data product thinking, it's essential to consider data lineage, data discoverability and data governance. Our teams have found that DataHub can provide particularly useful support here. Although earlier versions of DataHub required you to fork and manage the sync from the main product (if there was a need to update the metadata model), improvements in recent releases have introduced features that allow our teams to implement custom metadata models with a plugin-based architecture. Another useful feature of DataHub is the robust end-to-end data lineage from source to processing to consumption. DataHub supports both push-based integration as well as pull-based lineage extraction that automatically crawls the technical metadata across data sources, schedulers, orchestrators (scanning the Airflow DAG), processing pipeline tasks and dashboards, to name a few. As an open-source option for a holistic data catalog, DataHub is emerging as a default choice for our teams.

Oct 2022
Trial ?

Since we first mentioned data discoverability in the Radar, LinkedIn has evolved WhereHows to DataHub, the next generation platform that addresses data discoverability via an extensible metadata system. Instead of crawling and pulling metadata, DataHub adopts a push-based model where individual components of the data ecosystem publish metadata via an API or a stream to the central platform. This push-based integration shifts ownership from the central entity to individual teams, making them accountable for their metadata. As a result, we've used DataHub successfully as an organization-wide metadata repository and entry point for multiple autonomously maintained data products. When taking this approach, be sure to keep it lightweight and avoid the slippery slope leading to centralized control over a shared resource.

Apr 2021
Assess ?

Since we first mentioned data discoverability in the Radar, LinkedIn has evolved WhereHows to DataHub, the next generation platform that addresses data discoverability via an extensible metadata system. Instead of crawling and pulling metadata, DataHub adopts a push-based model where individual components of the data ecosystem publish metadata via an API or a stream to the central platform. This push-based integration shifts the ownership from the central entity to individual teams making them accountable for their metadata. As more and more companies are trying to become data driven, having a system that helps with data discovery and understanding data quality and lineage is critical, and we recommend you assess DataHub in that capacity.

Published : Apr 13, 2021

Download the PDF

 

 

English | Español | Português | 中文

Sign up for the Technology Radar newsletter

 

Subscribe now

Visit our archive to read previous volumes