Enable javascript in your browser for better experience. Need to know to enable it? Go here.
A lean data product

A lean data product

You might have heard that you need a fully functional data platform to build a data product. But building out a data platform requires a large upfront investment and significant risk. So we ask: can we build data products that provide real customer or business value in the short term, while supporting extensibility towards a shared platform medium to long term, where no suitable platform currently exists? A combination of lean product development principles and data mesh might hold the key to low risk, fast solution delivery with future product extensibility. In this blog, we’ll share our experience of employing lean principles to develop and deliver a data product.

 

At the point a business identifies that a new requirement could potentially be fulfilled by a data product, we enter a phase of discovery and data exploration. With the help of domain experts, we can refine the business requirements to outline data sources that are required for the product. We can also refine the features and cross-functional requirements of the consuming domain data product and source domain products, and align on the intended business value. For example, in the exploratory phase of a recent project to build a financial services product, domain experts identified that only 7 tables from the source database (out of 150 tables in total) were necessary to deliver the minimum requirements. Fewer tables reduced the data ingestion effort as we didn't process or clean unnecessary data.  

 

Once stakeholders are aligned on the product form and value it should deliver, we start product development. To prove the concept and see it in action we may build a Minimum Viable Product (MVP). Scoping an MVP can be difficult; however, a good indicator is that it should be the thinnest ‘vertical’ slice that provides feedback about the viability of an idea or product. The MVP should be close enough to our final product from a customer perspective to validate our concept, but an MVP does not have to be operationally efficient, provided operation is considered. The MVP will also uncover potential edge cases, hidden or missed product opportunities and possible obstacles. This early feedback minimizes the risk of product failure before committing to full development; the sooner challenges, blockers or gaps in analysis are found, the better. 

 

The MVP phase defines data sources and transformations which can be leveraged in the full production delivery phase. Indeed, our main focus for the production delivery phase was on implementing well governed transformations on supportable and extensible data infrastructure and serving the results to data product consumers. Key DevOps/DataOps practices were the use of infrastructure as code and dedicated observability capabilities in both control and data planes, delivered end-to-end with CI/CD.

 

The result was that we delivered consumable, fully automated and comprehensively governed data products with no dependency on a centralized data platform. Building in alignment with prevailing data engineering standards and carefully defining the boundaries of the data product provides a clear path to integrating this single data product into a future data mesh. Observability as a key concern for federated governance was addressed, as it provided immediate value. We could introduce an embedded data product self-description for discoverability when more data products are introduced, and compute and storage could be obtained from a self-service platform in due course. However, that is beyond the scope of this blog.

 

In summary, with lean principles we have built a single data product that can immediately provide customer and business value, but with an explicit recognition and consideration for how it will become part of a data mesh in the future.

Disclaimer: The statements and opinions expressed in this article are those of the author(s) and do not necessarily reflect the positions of Thoughtworks.

Want to unlock your data potential?