May 2020

Marquez is a relatively young open source project for collecting and serving metadata information about a data ecosystem. It represents a simple data model to capture metadata such as lineage, upstream and downstream data processing jobs and their status, and a flexible set of tags to capture the attributes of data sets. It provides a simple RESTful API to manage the metadata which eases the integration of Marquez to other tool sets within the data ecosystem.

We've used Marquez as a starting point and easily extended it to fit our needs such as enforcing security policies as well as changes to its domain language. If you're looking for a small and simple tool to bootstrap storage and visualization of your data-processing jobs and data sets, Marquez is a good place to start.