Enable javascript in your browser for better experience. Need to know to enable it? Go here.
The Practical Data Test Grid

The practical data test grid

The practical test pyramid, while not a prescription, is a simple but powerful way to get fast feedback on software delivery quality in an economic manner.


We apply this type of thinking at Thoughtworks across our delivery work, including when developing data-intensive applications, and the benefits in increased speed and quality are substantial.


Testing data intensive applications raises a new set of considerations, but we can apply the same type of thinking to guide how and where we invest our effort to get a timely and clear picture of either data quality or code quality, or both in combination. 


Analogous to the code-testing layers of the practical test pyramid, we consider the following data-testing layers:


  • Point data test captures a single scenario which can be reasoned about logically. These should be cheap to implement and plentiful, to set expectations in a range of specific circumstances.

  • Sample data tests give us valuable feedback about the data as a whole without processing large data volumes. They allow us to understand fuzzier expectations and variation in data, especially over time. These come with additional complexity and some tuning of thresholds but will uncover issues not captured by point tests. Note these could be synthetic samples.

  • Global data tests uncover further unanticipated scenarios by testing against all available data, but are also least targeted, most subject to outside changes, and most computationally expensive. Thus they sit at the top of the data-testing pyramid. 


These tests can be applied to data alone, or combined with code tests to verify various stages of data transformation, in which case we would consider the two dimensions as a practical data test grid. Again, this is not a prescription, you needn’t fill every cell and the boundaries aren’t always precise, but this grid helps direct our testing and monitoring effort for fast and economical feedback on quality in data-intensive systems. 


If you’d like to learn more about data testing in practice, please get in touch.

Disclaimer: The statements and opinions expressed in this article are those of the author(s) and do not necessarily reflect the positions of Thoughtworks.

Want to unlock your data potential?