Enable javascript in your browser for better experience. Need to know to enable it? Go here.
Published : Apr 03, 2024
Apr 2024
Trial ?

Comparing DataFrames is a common task in data engineering, often done to compare the output of two data transformation approaches to make sure no meaningful deviations or inconsistencies have occurred. DataComPy is a Python library that facilitates the comparison of two DataFrames in pandas, Spark and more. The library goes beyond basic equality checks by providing detailed insights into discrepancies at both row and column levels. DataComPy also has the ability to specify absolute or relative tolerance for comparison of numeric columns as well as known differences it need not highlight in its report. Some of our teams use it as part of their smoke testing suite; they find it efficient when comparing large and wide DataFrames and consider its reports easy to understand and act upon.

Download the PDF

 

 

English | Español | Português | 中文

Sign up for the Technology Radar newsletter

 

Subscribe now

Visit our archive to read previous volumes