Data mesh: it's not just about tech, it's about ownership and communication

Part 2

Pablo Giner ,

Oscar Torres Fernandez ,

Diana Pinto and

Javier García Hernández

Published: December 19, 2022

In the first article in this series, we explored how the multi-category app Glovo began its Data Mesh journey, which ended by building its first data product. In this piece, we’ll examine the challenges it faced as it continued that journey, as more domains created data products and the complexity began to increase.

The first data product delivered was one for the customer behavior domain. To grow the Data Mesh strategy, the teams at the data space at Glovo needed to create more data products across all different domains. The challenge for them was how to scale from having one team that had developed a single data product to having multiple teams developing separate data products, while continuing to drive improvements.

The solution utilizes Amazon Managed Workflows for Apache Airflow (MWAA) to orchestrate workflows with minimum maintenance and setup overhead . Amazon EMR cluster was used for processing large datasets while AWS Glue was used for metadata management. Amazon S3 was used as the storage option for some output ports.

The adoption phase

Having only a few data products was relatively easy to handle. There were no dependencies between them and, therefore, minimal alignment needed and their progress was easy to track. But when more data products were developed, it became more complex to manage them.

To understand this challenge, it helps to consider how those data products would be delivered. For instance, while one domain team was building a new data product, it could need to ingest data from data products being built by other teams. That meant domain teams needed detailed alignment on their roadmaps during the planning, development and delivery phases.

Within three months, there were four separate domain teams creating their own data products.

Technical challenges

Each team started to add logic to their data using different technologies (like PySpark or DBT), all of them running in AWS. At the time, the platform didn’t support some of these libraries, so the data teams needed to add abstraction layers to provide that support until the platform team was able to add the required capabilities. Until then, they couldn’t be used.

Meanwhile, data engineering teams had to face some dependencies to start the development of new data products: they had to ask for permissions from the platform team and set up the whole environment. Elsewhere, these teams had their own approvals process which was again dependent on the platform engineers.

All these processes took a significant amount of time, becoming a bottleneck and increasing data products’ cycle time, which forced the teams to re-shape them to stabilize their data products’ delivery time.

Operational challenges

Until the Data Mesh was fully established and operational, many of the teams needed to work with both the legacy data warehouse and in the development of data products at the same time.

So as the teams were moving towards establishing the Data Mesh, Thoughtworks worked closely with Glovo's data product managers, leveraging their operational and business knowledge through the data teams. In this context, the new data teams set their structures, ways of working, ceremonies and backlog, keeping at the same time their autonomy.

Given that data teams in the different clusters needed to scale fast, an enablement team was created — comprised of Thoughtworkers and Glovo team members. Its purpose was to support the new teams and ease the creation of data products.

How to enable others in a Data Mesh context

While it’s relatively easy to say ‘let’s create an enablement team’, you might want to take a pause before leaping in. The word itself — enablement — can be a problem. To some, enablement might mean a team that jumps into a domain, solves any problems and then leaves. What we’re looking at is enablement as a mechanism that lets domain teams perform certain tasks, to empower that team to assess any problems and come up with their own solutions.

At Glovo, the enablement team proposed a four-phase journey methodology:

Assessment. The enablement team collaborates with the domain team to understand their skill levels and their roadmap
Enablement. The team works alongside other domain developers as part of an ongoing delivery engagement to enable them on the Data Mesh competencies
Empower. The team works alongside anchor/lead archetype developers to empower them to drive large-scale solutioning
Sustain. The team takes a hands-off approach and domain developers take lead on all aspects of BAU development with the objective being to create a self-sustaining, independent team

Depending on the case, we needed to be flexible enough to apply all those steps according to the team structure, people’s ways of working, etc. But always looking to maximize the outcomes of the teams.

But Glovo is big, very big, and it’s impossible to think that a single team can enable, in a quick way, all the tech teams. That’s why Glovo chose to fully deprecate their legacy system, to reduce their platform teams’ span of focus and to reduce the complexity of their users. The strategy was to follow strangler patterns, first putting place the data products’ use cases, prioritizing on the basis of anticipated impact, and measuring the load shift between legacy and data mesh systems.

To accelerate the creation of new data products, the customer behavior data team, in conjunction with the data platform team, created a “data platform kit”. This is a set of tools, along with recommendations for using the tools in a consistent way, to help teams develop data products faster, by providing, for example, boilerplate structures to start coding.

Further initiatives to help mitigate certain risks — such as lack of documentation, knowledge sharing — included the creation of a “data product playbook”. This outlined all the stages that a data product can go through. We also created a catalog to list all currently available data products in the production environment, with additional information that can be relevant for the company and the teams specifically.

Defining data quality and standards

One of the main reasons to adopt the Data Mesh approach was to increase data quality and set standards across the data teams. A new governance workgroup was formed to support these goals.

Glovo started to continuously monitor and observe the data team’s performance endowing agile solutions, suggestions and improvements. Data, platform and governance teams started to meet on a weekly basis to share their implementation updates and their roadmaps. This new structure enabled them to grow faster and better, improving things, such as cycle times.

At the same time, the governance board implemented automated data quality checks. These included validations of accuracy, completeness, timeliness and uniqueness of the data. This board also started to introduce security levels of data that each team needed to follow. For example, regarding GDPR regulation, if a client requests the right to be forgotten, our data should be structured in a clear and transparent way so we can cope with this in a timely manner. These metrics were enabled in a uniform fashion for all data products.

The path is clear but the journey isn’t finished

There have been many new advances that we made around Data Mesh implementation. And while the fundamentals of that strategy are now in place, there is still room for improvement.

Now that the team structure and the technology are in a good shape, we’re focusing on other areas where we can optimize our approach. These include the following three main areas:

Data availability, quality and data generation.

Making data discoverable. Every team must be aware of the data that’s available to be exploited and for the development of new data products. To achieve that, we are building a data catalog that will include all the available data products for consumption.
Applying engineering operational excellence standards to Data Mesh, meaning that no quality issue goes unnoticed.
Continuous improvement. Measuring the outcomes of the data products that are in production is something that will help to identify areas of improvement and to be more accurate in the results they bring.
Moving beyond batch to streaming data.

Data Products’ development

Define which feature is meant to be a data product and how to prioritize its development. As we mentioned before, while the different domains are growing and the idea of building a Data Mesh approach is penetrating in the teams, it is necessary to define a good process to request the creation of data products from the data teams. Without this, they will have a huge number of requests that could be difficult to manage and prioritize. At this moment, data teams are embedded in the definition of a process. That means when a tech team requests the creation of a data product, before starting its development, the data product managers evaluate its impact, the usability and the value it will provide to the rest of the tech teams. With this approach, the data teams can ensure they’re working on the right product and making best use of their time. And defining this process helps to better understand and analyze the real need of having specific data products, thus avoiding the trap of over-engineering cases.
Building of an abstraction layer in our platform to decrease cycle time and accessibility. by simplifying the developer journey, you can give access to less technical profiles. At Glovo, we envision that Data Analysts will develop the simpler Data Products.
Currently Glovo is already in the process of creating a Data Mesh costs’ data product in which the logs will be tagged in such a way that they could identify the domain, the data product and the correspondent service. Giving this visibility to the costs related to creation and maintenance of data products will give significant insights to reduce costs emerging from the use of logs as part of AWS infrastructure.

Improve teams’ ownership and ways of working

Pushing source-aligned data products ownership upstream: at the moment, only data teams are creating data products, which is not fully compliant with Data Mesh. The goal is that engineering teams get enough ownership to also develop data products in the near future.
Implementing metrics that are not only related to the performance of the data itself but also to the team's workflow stability will help to deliver more reliable data products faster and with fewer errors. Some data teams already adopted these metrics, measuring deployment frequency, lead time for change, mean time to recover and change failure rate (DORA metrics).
Improving knowledge sharing capabilities to preserve it from unforeseen situations. Finding ways to make everyone aware of the work that is being done, and its status, is critical. This can be done through pair programming or knowledge sharing sessions with the team. As an example of these sessions, a Data Mesh guild was created. The guild allows engineers within each cluster to meet frequently, discuss progress and technical aspects of Data Mesh and data products development, and surface early areas for collaboration and decision. These initiatives provide real impact and great results.

What’s next for Glovo’s Data Mesh journey?

Last year, Glovo stepped on the accelerator and is growing into the Data Mesh space. For a product-centric company like Glovo, this will bring a lot of benefits. It’s amazing to look back in time and see how much Glovo has accomplished since the early days, when Data Mesh was just a concept. Now, they have a solid structure and are advancing in the development of different data products (more than 30 so far this year) that will boost the decision-making process based on more accurate and trusted data.

Disclaimer: The statements and opinions expressed in this article are those of the author(s) and do not necessarily reflect the positions of Thoughtworks.

Industries

Publications and Tools

All Insights