May 2020

There are still some tool gaps when applying good software engineering practices in data engineering. Attempting to automate data quality checks between different steps in a data pipeline, one of our teams was surprised when they found only a few tools in this space. They settled on Deequ, a library for writing tests that resemble unit tests for data sets. Deequ is built on top of Apache Spark, and even though it's published by AWS Labs it can be used in environments other than AWS.