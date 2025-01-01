By David Tan and Mitchell Lisle

Approaching the development of data products as you would approach building software is a good starting point. But data products are typically more complicated than software applications because they are software and data intensive. Not only do teams have to navigate the many different components and tools that are part and parcel of software development, they also need to grapple with the complexity of data. Given this additional dimension, it can be all too easy for teams to get mired in cumbersome development processes and production deployments, leading to anxiety and release delays.

At Thoughtworks, we find that intentionally applying “sensible default” engineering practices allows us to deliver data products sustainably and at speed. In this article, we’ll dive into how this can be done.

Applying sensible defaults in data engineering



Many sensible default practices have their roots in continuous delivery (CD) – a set of software development practices that enables teams to release changes to production safely, quickly and sustainably. This set of practices reduces the risk of error in releases, reduces time to market and costs, and ultimately improves product quality. Continuous delivery practices (such as automated build and deployment pipelines, infrastructure as code, CI/CD and trunk-based development) also positively correlate with an organization’s software delivery and business performance.

From development and deployment to operation, sensible default practices help us build the thing right. These practices include:



Trunk-based development

Test-driven development

Pair programming

Build security in

Fast automated build

Automated deployment pipeline

Quality and debt effectively managed

Build for production

As we will elaborate later in this chapter, these practices are essential in managing the complexity of modern data stacks and accelerating value delivery because they provide teams with the following characteristics which help teams deliver quality at speed:

Fast feedback: Find out whether a change has been successful in moments, not days. Whether it’s knowing unit tests have passed, you haven’t broken production, or a customer is happy with what you’ve built.



Simplicity: Build for what you need now, not what you think might be coming. This lets you limit complexity, while enabling you to make choices that allow your software to rapidly change and meet upcoming requirements.



Repeatability: Have the confidence and predictability that comes from removing manual tasks that might introduce inconsistencies and spend time on what matters – not troubleshooting.

Engineering practices for modern data engineering



While there is a rich body of work detailing how you can apply continuous delivery when developing software solutions, much less is documented about how you can use these practices in modern data engineering. Here are three ways we’ve adapted these practices to build and deliver effective data products, fast.



1. Test automation and test data management



Test automation is the key to fast feedback, as it allows teams to evolve their solution without the bottlenecks that result from manual testing and production defects. In addition to the well-known practices of test-driven development (guiding software development by writing tests), it’s also important to consider data tests.

Similar to the practical test pyramid for software delivery, the practical test data grid (Figure 2) helps guide how and where you invest your effort to get a clear, timely picture of either data quality or code quality, or both. The grid considers the following data-testing layers: