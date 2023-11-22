When designing the solution for this space, we first had to consider a classic problem in data: the privacy vs. utility tradeoff. Although a range of privacy techniques exist, there is no ‘one size fits all’ solution; different privacy technologies and approaches are appropriate for different use cases. For instance, anonymization may be useful for regular reporting around specific metrics, but it may prevent a data scientist from creating a personalized recommendation algorithm for customers. When deciding on an approach, an “acceptable” trade-off has to consider all sides; the legal and risk appetite of the company, the use case you’re tasked with solving and, most importantly, the sensitivity of the data.

We therefore needed our architectural approach to provide flexibility for a range of techniques depending on the use case.

Today’s advantage: The market has changed

As mentioned previously, a wide range of sophisticated techniques exist, including federated machine learning and analytics which move the algorithm to the data rather than the data to the center, as well as advances in mathematical techniques in encryption and anonymization — such as differential privacy. These have advanced massively over the last five to 10 years and are being used by tech behemoths including Google and Apple.

Of course, their sophistication may be off-putting given the skill set needed to develop these techniques, particularly when seemingly simple privacy solutions such as third-party “clean rooms” exist.

However, we’re also seeing these solutions become more easily available thanks to projects such as Flower Labs. Given the new and open availability of these technologies, such as those developed by Flower, and the escalated risk of sharing highly sensitive data when using third-party clean room solutions, we decided to develop an architecture that allows for increased privacy by taking advantage of them. This enables further risk mitigation without increasing the cost.

How does this work now?

Suppose a user — whether a data scientist, analyst or other — requires access to data from different sources to uncover the answer to a question. These sources might be databases in different departments, databases in different organizations entirely or a million internet of things (IoT) devices. In many organizations, each data producer would develop an API solution so others can pull data, supported by a centralized data platform that makes self-serve easier. However, where there are serious concerns about sharing this data beyond certain boundaries, this becomes challenging. One common pattern we’ve seen is creating a request system where experts can determine whether they can fulfill certain data sharing requests based on predetermined sharing agreements. We’ve observed this at a range of organizations including large banks with a Data Mesh, but for our proof of concept (PoC) we focussed in particular on the challenge facing UK public sector organizations.