Making privacy a first-class citizen in data science

Podcast host Rebecca Parsons and Birgitta Böckeler | Podcast guest Katharine Jarmul

June 15, 2023 | 31 min 54 sec

Listen on these platforms

Brief summary

A changing regulatory environment has made it more important than ever for organizations to embed privacy in their data infrastructure. Doing so, however, can be complicated — that means data scientists have an vital role to play in ensuring privacy is a key concern from both a technical and commercial perspective.

Thoughtworker and data scientist Katharine Jarmul is eager to help fellow data scientists master privacy principles and techniques. Her new book, Practical Data Privacy, covers everything from the fundamentals of governance and anonymization through to advanced approaches to data privacy like federated learning and encrypted computation.

In this episode of the Technology Podcast, Katharine joins hosts Rebecca Parsons and Birgitta Böckeler to discuss the book and explain why data scientists need to be on the frontline in the fight for privacy.

Find Practical Data Privacy on Amazon

Episode transcript

[MUSIC PLAYING]

Rebecca Parsons: Hello, everyone. My name is Rebecca Parsons. And I'd like to welcome you to another edition of the Thoughtworks Technology Podcast. And I am joined today by one of my co-hosts, Birgitta.

Birgitta Bockeler: Hi. This is Birgitta Böckeler. I'm a technical principal with Thoughtworks in Berlin, Germany.

Rebecca: And our special guest today is Katharine, who has a new book almost coming out, getting very close, on practical data privacy. So Katharine, would you please introduce yourself?

Katharine Jarmul: Yeah. No problem. Thank you so much for having me.

My name is Katharine Jarmul. I'm a principal data scientist at Thoughtworks Germany. And my focus for the past five, six years has been on data privacy in the area of machine learning or what we would now call AI and data science. And the upcoming book, which I'm excited to talk about, is coming by May 2023 on O'Reilly called Practical Data Privacy. And I believe the tagline is "enhancing privacy and security in data."

Rebecca: So why don't you start by telling us a little bit about the book, the approach that you took to the subject, and who you're trying to reach?

Katharine: Yeah. So the book in and of itself was I guess first conceived by O'Reilly who gave me a call because I had run some trainings on data privacy for them. And I think that definitely what we're seeing in the world of generative AI but also some of the trends that we're seeing globally around data protection regulation, we're starting to shift the amount of interest that O'Reilly was seeing around data privacy as a topic. They were aware of this. And they were kind of aware that there's a bit of a gap in their library, which is why they really wanted a text that I was very, very happy to write, which is for data scientists by data scientists, so really somebody that's practically kind of doing data scientist work every day, whether it be machine learning or not: how do we actually build privacy into these systems, because a lot of these systems currently, they're not built with privacy as a first-class citizen, so to speak, of the data science process.

And that's been a lot of my work for, again, the past five, six years is thinking about how do we build privacy into the data science process and how do we as a data science and machine learning community, how do we learn how to use these tools because a lot of us don't come from a background where we were necessarily trained on principles like privacy by design and stuff like that. And so overall, I think technologists overall can learn from it. But it's specifically geared towards data scientists and teaching them about privacy technologies and data privacy as a concept to be able to build them into real data science workflows.

Birgitta: So is it fair to say then that it's all about navigating that big tradeoff of, as a data scientist, I need all of this data to get the best insights out of it but also a lot of that data, I actually do not even want to see and I don't want my model to see because I want to preserve privacy? And then you constantly have to navigate that tradeoff somehow. Yeah? And that's what it's about? Is that fair to say?

Katharine: Yeah. I mean I think that's a general theme in the book. And I really like that you point it out. And it's kind of like sometimes we have this false mindset that privacy is on or off. And early on in the book, I start to introduce the idea of privacy and utility or privacy and information continuum, that we have this continuum, and we can find ourselves anywhere along it between the amount of privacy we'd like to offer for the people in the data set and then the amount of information we'd like to gather from the data and kind of constantly coming back to this theme of figuring out how do we choose a good point along that continuum and improve that over time, hopefully using technologies that work really well so that it ends up being a bit of a win-win rather than a full tradeoff or compromise.

Rebecca: The title of the book really speaks to me with that focus on practical. Tell me a little bit more about how that informed your thinking about the kind of content that you would put in in addition to the way it informs your thinking about the field in general?

Katharine: Yeah. I mean so when I started-- another reason the book exists or why I was really excited to write the book is when I started getting into this topic, a lot of the content available was either very, very high level — kind of general — and one was left thinking, OK, I get the idea but how do I do it, how do I build it? And then another area of the literature was extremely research-based and academic in nature. And I love reading research. Don't get me wrong. And I love diving into it. But that also often lacked this ability for me to take something directly from research.

A) do I understand it? Do I know all the words? Do I know all the concepts or do I have to do a bunch of extra reading to understand it and contextualize it? And B) even if I do understand it from the reading I've done thus far, how do I actually take it and use it?

And I think that's really truly what this book is aimed at is bridging that gap between a more general and a more kind of general technologist or general data person understanding of these concepts, so making it easier, making the learning curve less steep to get into the topics, and then, B, every chapter has an implementation section. So every chapter has actual code.

There's a repository. You can run the stuff. There's example data that I use. So my goal is to get people out of being privacy interested and into actually privacy active where they can actively use the code from the book or use the libraries and concepts from the book, know the base theory, and then apply it to their work.

Birgitta: And it's also sometimes like me, for example, the way I learn is that I always need some examples of practical application. Otherwise it just doesn't stick for me often. Yeah. And I had a look at that repository that you have for the book. And I really like how, in the notebooks, you almost have a little storyline. Like I saw stuff like, oops, that didn't work but now we still need to identify this person. So that's like a nice little storyline even in the notebooks in the repository, which I really liked.

Katharine: Well, thanks. Yeah. Yeah. I think like a lot of us, we learned by doing, right? And so I think there's a non-trivial amount of technologists who just really want to play with something. In the notebooks, too, there's a lot of challenges like homework, like can you figure this out? Can you do this? And I'm hoping that there will also be some reader contributions to the notebooks over time.

Rebecca: So one of the things that I was really struck by in the book was the way you were tying privacy explicitly to business goals. And you actually had a sentence, and I'm paraphrasing it: "And if you think that privacy conflicts with your business goals, you need to rethink your business goals"! Can you talk a little bit more about that because I think when you were talking about that continuum earlier, I do think some people still see these things in conflict.

Katharine: Yeah. I mean there's been decades now of us collecting potentially the most data that the world has ever collected with increasing acceleration. And that's kind of been the trend. And that's been greatly accepted by industry. We've made amazing inroads in doing high performance networking, high performance computing, distributed computing, and all these things. And now I think finally with kind of both climate change and the climate disaster questions, but also with privacy questions, and also with usefulness questions, there's a bit of now starting to push back of can we do more with less? Is there this ability to do so?

And one of the big reasons why I got into privacy is I started to notice-- I was working in natural language processing, so some of the same technologies that now things like ChatGPT use. And what I noticed is there was a never ending quest for more data and the idea of bigger and bigger models — which, of course, we see now with the large language models — but the more data that we put into it, particularly the more readily available data, the more we started running into quite toxic issues in the language modeling where there was whole regions of the language model that were really abhorrent in a lot of ways. And a lot of that had to deal with figuring out what data do I actually want to use, and do I want to be picky about the data I use, and do I want to think about the use of this model when I'm training all of this and when I'm doing all of this.

And that actually drove me to privacy because it drove me to privacy because we were asking these questions of what things should a model be able to learn when given private or sensitive data? And there one of the connections we saw is if I have a model and it makes a choice based on a sensitive attribute, let's say your gender, or your race or ethnicity, then I've built probably a model that I need to question or think about.

And when we think about the business goals of implementing these models, it's kind of like when we look at the engagement models or the recommendation models that we have now.

They've been running for quite some time. But is really what we're after more clicks, more shares, more comments? Because that's what we've tended to optimize, and yet we find that sometimes these systems then essentially will engage with the most enraging content possible.

And so a lot of times when we build a business model and it's in contradiction to our definition of, let's say, ethical use of technology or our definition of privacy, we have to start questioning what it is we're actually trying to do with the technology that we're building. And I think that's, I guess, not only how I got into privacy but also this point of, if you can't build it with privacy built-in, maybe you need to wonder what's the goal of what you're building.

Birgitta: So these abhorrent parts of the model that you're talking about, are those examples there maybe not directly privacy questions but it's kind of like a similar problem, like that you have to think about what you put into the model and that kind of led you all the way to the root to this privacy question, yeah?

Katharine: Indeed. Yeah. And when we think about the language modeling, it was very open. One of the reasons why I think privacy in language modeling would be really cool is we wouldn't have to use terabytes of data scraped from the web. If, let's say, the three of us wanted to get together, and we wanted to build a large language model, and we trusted each other, and we trusted each other enough to build a model together, we could presumably use our personal texts-- these are often off limits of people like OpenAI. Right?

We could use our personal text so we could combine them to create language models that are closer to how we speak with one another and how we speak individually. Of course, we also need massive compute power, and I'm skipping over some of the implementation details. But the goal is for what that I see of privacy and machine learning, and in, quote unquote, "generative AI systems," is that humans feel more trusting towards the smaller groups and that communities can form and actually create the data, create these models, they can be maybe one day collectively owned or collectively used, and that these contributions maybe can also be tracked and revoked and so forth so with respect to privacy law.

And the goal there is less of “let's scrape the entire internet and push it into a language mode”l, and more let's think about whose data we can use with respect to the users' needs and build a language model that's better for everything that they're actually trying to do. And that's the end goal I guess.

Birgitta: Yeah. But to come back to Rebecca's question about privacy and its relation to business goals, Thoughtworks just released, together with the MIT, a study about responsible technologies and asked a lot of businesses also about the different areas of responsible tech that they are thinking about.

And Rebecca, you might remember the numbers better than I. I read it a few weeks ago. But there was like privacy concerns were actually high up on the list because it's become a lot more a topic among consumers and all of the different fail stories in the media and so on, right?

Rebecca: Yeah. And also interestingly — and frankly, I was a bit disappointed in this — while many businesses acknowledged that there were true business objectives around brand protection, brand recognition, employee brand recognition as well in terms of trying to be able to recruit talent, compliance was still considered a big driver for their interest.

And so I do think having the regulatory frameworks are also helping to drive some of that focus from a business perspective. And if that's what it takes to get it, I mean I would far rather they look at this from the perspective of the business opportunities that are made available by being privacy forward and seeing in the marketplace as respecting privacy. But who knows.

Katharine: Yeah. I think there's an interesting connection there though because the changes we're seeing in the regulatory environment are driven by democratic wishes for better privacy controls. So I mean folks are talking about this and kind of putting it forward as, hey, these issues are important to me. What's not yet clear yet is who's going to win the kind of consumer-facing part of that.

And I would hope, or one of the goals that I have, especially in my work here at Thoughtworks, is to empower more companies to be both data-forward, so still leading on data strategy and developing forward-thinking data initiatives, but also to be privacy-aware, because what we're seeing right now in terms of the people that have deployed or the companies that have deployed these systems, is it's often the very large technology companies that have deployed these systems thoroughly.

So they're not worried about compliance because they're kind of leading the compliance in terms of the technology advancements that they're actively deploying into production systems. And what I hope is that there's more players in that space, that it's not just Apple that gets to brag about their differential privacy implementation, that it can be every company, or at least lots of companies that can actually use these tools and empower their users to make consensual choices about their data use and collection.

Birgitta: So then what are these tools, Katharine? What are roughly the areas that your book covers, the practical things companies can do?

Katharine: Yeah. I mean we start with the very basics, which is really thinking through data governance. And in this case, you would also want to think through AI governance or machine learning governance as well. And with that comes kind of knowing what your data is, ensuring that your data is properly cataloged or organized, documented and so forth, understanding the consent and the other rules of your data collection. So that means also working with legal professionals and so forth, and basically ensuring that you're set up with the most basic of privacy protections, things like making sure there's tokens, or masks, or pseudonymization of privately identifiable information or PII.

That's kind of the basic level of stuff. And there I think there's still a lot of opportunities for every company to take a look at what they're doing and to increase kind of not only your data understanding, which is going to help with any data science initiatives you have, but also with just better quality data, better data literacy across the organization, as well as better compliance, auditing, monitoring, and kind of privacy initiatives.

Birgitta: So first you need to figure out what you even have to or want to keep private so you need the transparency.

Katharine: Absolutely. Yeah. Yeah. It doesn't help to apply a technology to everything. You need to kind of know what you're doing. So with a lot of these technologies, where they're at right now is it's not a generic one-size-fits-all solution. So knowing your data, knowing your use cases, knowing the rights that you have for what data you're trying to use, this can go a long way in setting yourself up for not only good experimentation but actually good practical and deployed usage of any of the new technology.

Rebecca: And so that's really the first step for an overall organization. But how would technologists who are intrigued by this notion of privacy engineering and privacy by design, what does it take to actually get started in doing this? I know you reference many different toolsets that are out there, but what's the way to get started?

Katharine: Yeah. So I mean I think that the data governance is a really good starting place. So ensuring that, again, the understanding of the data is high, that you also understand how data moves through the organization and what rights or what use cases should be given what access. And this is kind of where you're at the basic starting point.

If you want to then take it to the next level — so let's say you've kind of covered the bases on that initial step. If you want to say, OK, we have some new marketing use cases and we'd like to let's say compare data with another company and we want to figure out how to do that in the most privacy respecting way-- and this is not only good for privacy, this is also good for the proprietary information in your company, for things not leaking every time that you analyze a new partnership or every time that you sign up with a new data sharing platform. This is also kind of your data as a competitive advantage, which generally I think is kind of overseen as an advantage of deploying privacy technology but is clearly one.

And then the book goes through three major technologies that I'm really excited about and that I think are production ready. And that's differential privacy as a way to anonymize data. This is the strictest form of what we would call anonymization and then also things like federated learning, federated data analysis where we actually don't even centralize the data at all. We leave the data wherever it is. And this could be great for partnerships — so you're working with a new partner; you don't send them any data, they don't send you any data, you still perform analysis together. And you send each other just the results of these analysis. This can go a long way and can be combined also with differential privacy should you need it. And then a third technology that I'm really excited about that's covered in the book is a variety of types of encrypted computation.

And this means that we can actually compute. We can do data processing on data without ever decrypting it. So we can actually process, do insights, even run machine learning on encrypted data. And that goes a long way again in these partnerships in any types of data sharing situations to say, you know what actually, let's just only share the results of this analysis. Let's not actually share all of the data that we have here. And all of these can be and are used in production systems today. And so the book kind of takes you from, OK, you have your basics covered but let's move into kind of the new, and exciting, and usable types of advanced privacy technologies that have really come out of research labs and into production systems in the past, I don't know, five to 10 years.

Rebecca: And I can kind of get my head around both the differential privacy and the federated learning, conceptually. But my brain kind of goes into fits and starts of thinking about computation on encrypted data. Can you go that next level deep on how in the world does that work?

Katharine: Yeah. Yeah.

Birgitta: Yeah. Just before that real quick, it's the same for me also. Years ago I think at the Chaos Communication Congress, I saw a talk about sending queries to a server but the server must not understand your query but can still return a result for you so that also similarly broke my brain a little bit.

Katharine: Yeah. Yeah!

Birgitta: So yeah. I would love to hear some like for dummies summary from you!

Katharine: Well, I don't think I have any dummies on this podcast call right now, but I'm happy to take it at a step! So encrypted computation, really it's kind of the coolest field ever. If I have to choose the coolest field ever, it's encrypted computation. It's a subfield of cryptography as a whole.

So a lot of the core math that we use in cryptography, it uses this concept of a ring or of a field. So you know how a ring, so you have a clock. A clock is a ring. It's really easy. It goes to 12 and then it comes back around and then it goes to 12 again. And when we operate in a ring, we can use modular arithmetic. So we can use remainder arithmetic. Right. And so when I go past 12:00, I go back, I wrap around to 1:00, and to 2:00, and so forth.

And so these properties are actually often used in cryptography in what's called a field. So, if instead of choosing 12:00, I choose let's say a super large prime number — and you might remember this from setting up cryptography systems, and you're like, OK, there's a prime that we use, and there's a prime generator and so forth — this is a lot of how some of these systems work.

And so let's say you have a really big prime. And instead of choosing 12 as the end of your wraparound, you use a really huge prime number, well, one of the cool things that that does is it hides your data. So when it wraps around and it goes again, we can use the properties of this field to essentially encrypt the number and hide it in a variety of methods and then decrypt it. So we use this field to say, OK, I want to take these numbers away from it in order to get back my original value. We could do that. I'm simplifying a little bit for ease of understanding here.

And so what we do in encrypted computation, one of the things that we use is we use this property but we use only very special cryptosystems or very special protocols in the field of — there's a subfield that's also talked about in the book called multi-party computation where you can essentially split a piece of data into a few different secrets and share it with a few different people — and only if those people combine their data can they use it to compute.

So you have their two methods, either a special cryptosystem or these special protocols like secret sharing. And then you can actually use the same properties of the field to make sure you get a correct result, which is kind of crazy when you think about it; that, at the end of the day, this is all ensuring that the mathematic ends up so that when you add two numbers in encrypted space, so say an encrypted four and an encrypted eight, you make sure that actually the math works out such that you get an encrypted 12 as the result. And then you can use that same decryption method to actually reveal the final output.

Of course, many of these things are much more complex than a simple addition, right? But it essentially uses these base properties to ensure that you get a manageable result. But it doesn't work with every single cryptosystem. We have to use ones with homomorphic properties or we have to use special protocols like secret sharing. So you can't just do it with, let's say, RSA or something like this.

Birgitta: And then you were talking about computational numbers so I'm guessing this is then all based on data that has already been transformed into numbers, into features for example, and so on. Yeah?

Katharine: Absolutely. Yeah. That's an important point.

Birgitta: So there's no more name there or, I don't know, like let's say colors of clothes or stuff like that. But it's all been turned into numerical data. Yeah?

Katharine: Yeah, absolutely. So you need to do the same encoding that you would expect like for a typical data science problem or machine learning problem where you take, let's say, all of the colors you can represent and you decide red is one, and blue is two, and so on and so forth. And with these categorical structures, you can then still aggregate in encrypted space and so forth.

And there's a few code examples in the chapter in the book obviously and some notebooks that basically go through some of these base understandings of how fields function and how the basic crypto math functions in case people want to poke around and give it a try. It's pretty cool when you start reading the mathematics of crypto how much it actually gives us, how much thought also has gone into the ways that we decide to obfuscate data when we encrypt it and why that works.

Rebecca: Well, Katharine, what's your favorite message in the book or what's the one thing that you would want to get across to people about your philosophy on data privacy, privacy engineering?

Katharine: Yeah. I mean I think there's a few messages that are really core. I think one of them we already talked about a little bit, which is — and Rebecca, I really liked you bringing this up — which is if privacy is in direct contention with what you're trying to build, spend some time pondering that. And I think one of the sentences I have in the book that I liked a lot was “some models should never be built.” So if we know that we're building a model that will harm people, if we know that we're building a model that's going to directly change or impact somebody's life in a severely negative way of no wrongdoing of their own other than living in the wrong place, being the wrong person, then this is obviously something we need to question.

But a second one that I was excited about in this conversation too is how do we empower people with their data, and how do we work better collectively and communally with data with one another, and how can we maybe even change the way that the data landscape looks like? If we were to really offer privacy-respecting alternatives, could we build really cool GPT models that have less toxic problems? Could we change the way that people decide to share data with one another? And this, I think, is something I'm really excited to see about in the future.

Birgitta: So much maths magic these days in the industry...

Katharine: Indeed.

Rebecca: Well, Thank. You so much, Katharine, for joining us today and for writing the book. I think it's going to have an impact on lots of people who want to understand a bit more how we can make our systems more privacy aware, more privacy forward, and therefore more responsible. So thank you, Katharine.

Katharine: Thank you, Rebecca. Thank you, Birgitta.

Rebecca: And thank you, Birgitta, as well for joining me. And I hope to see everybody on the next or to have everybody on the next edition of the Thoughtworks Technology podcast.

Rebecca: Thank you.

Katharine: Thank you.

[MUSIC PLAYING]

View less

More episodes

Episode name

Published

Themes from Technology Radar Vol.31

October 17, 2024

Build Your Own Radar: Using the Technology Radar as a governance tool

October 03, 2024

Exploring DuckDB: A relational database built for online analytical processing

September 19, 2024

Software service granularity: Getting it right

September 05, 2024

Measuring developer experience

August 22, 2024

How can AI support designers?

August 08, 2024

Sensible defaults: A way to think about our technology practices

July 25, 2024

Tracking technology stacks, practices and experiences across teams

July 11, 2024

Inside Bahmni: An open-source digital public good

June 27, 2024

How to assess your organization's security maturity

June 13, 2024

Continuous delivery vs. continuous deployment: What should be the default?

May 30, 2024

Themes from Technology Radar Vol.30

May 16, 2024

Building at the intersection of machine learning and software engineering

May 02, 2024

Refactoring with AI

April 18, 2024

How to measure your cloud carbon footprint

April 04, 2024

Technology through the Looking Glass: Preparing for 2024 and beyond

March 21, 2024

Diving head first into software architecture

March 07, 2024

Exploring the building blocks of distributed systems

February 22, 2024

Software-defined vehicles: The future of the automotive industry?

February 08, 2024

Beyond the DORA metrics: Measuring engineering excellence

January 25, 2024

Asynchronous collaboration: Getting it right

January 11, 2024

Looking back at key themes across technology in 2023

December 28, 2023

Leveraging generative AI at Bosch

December 14, 2023

Jugalbandi: Building with AI for social impact

November 30, 2023

AI-assisted coding: Experiences and perspectives

November 16, 2023

What's it like to maintain an award-winning open source tool?

November 02, 2023

Engineering platforms and golden paths: Building better developer experiences

October 19, 2023

Managing cost efficiency at scale-ups

October 03, 2023

Exploring SQL and ETL

September 21, 2023

Driving innovation in radio astronomy

September 07, 2023

XR with impact: Building experiences that drive business value

August 24, 2023

Leadership styles in technology teams

August 10, 2023

Making design matter in technology organizations

July 27, 2023

Generative AI and the future of knowledge work

July 13, 2023

Scaling mobile delivery

June 29, 2023

Making privacy a first-class citizen in data science

June 15, 2023

Multi-cloud: Exploring the challenges and opportunities

June 01, 2023

Scaling up at Etsy

May 18, 2023

TinyML: Bringing machine learning to the edge

May 04, 2023

The weaponization of complexity

April 20, 2023

How we put together the Technology Radar

April 06, 2023

Inside India's Drug Discovery Hackathon

March 23, 2023

Serverless in 2023

March 09, 2023

My Thoughtworks journey: Rebecca Parsons

February 23, 2023

How to tackle friction between product and engineering in scale-ups

February 09, 2023

6 key technology trends for 2023

January 26, 2023

Tackling system complexity with domain-driven design

January 12, 2023

Shifting left on accessibility

December 29, 2022

Data Mesh revisited

December 15, 2022

Low-code/no-code platforms: The 10% trap and the limits of abstractions

December 01, 2022

Welcome to the fediverse: Exploring Mastodon, ActivityPub and beyond [Special]

November 24, 2022

Rethinking software governance: Reflecting on the second edition of Building Evolutionary Architectures

November 17, 2022

Reckoning with the force of Conway's Law

November 03, 2022

Exploring the Basal Cost of software

October 20, 2022

Why full-stack testing matters

October 05, 2022

Acknowledging and addressing technical debt in startups and scale-ups

September 22, 2022

XR in practice: the engineering challenges of extending reality

September 08, 2022

Agent-based modelling for epidemiology: EpiRust and BharatSim

August 19, 2022

Mastering architectural metrics

August 12, 2022

Building a culture of innovation

July 28, 2022

Starting out with sensible default practices

July 14, 2022

Better testing through mutations

June 30, 2022

Patterns of legacy displacement — Part two

June 16, 2022

Patterns of legacy displacement — Part one

June 02, 2022

Mitigating cognitive bias when coding

May 19, 2022

Following an usual career path: from dev to CEO

May 05, 2022

Software engineering with Dave Farley

April 21, 2022

Tackling bottlenecks at scale-ups

April 07, 2022

Coding lessons from the pandemic

March 24, 2022

Is there ever a good time for a code freeze?

March 10, 2022

Navigating the perils of multicloud

February 25, 2022

Compliance as a product

February 10, 2022

The big five tech trends for 2022

January 27, 2022

Fluent Python revisited

January 13, 2022

Creating a developer platform for a networked-enabled organization

December 30, 2021

The art of Lean inceptions

December 16, 2021

The hard parts of data architecture

December 02, 2021

TDD for today

November 18, 2021

You can't buy integration

November 04, 2021

The rise of NoSQL

October 21, 2021

The hard parts of software architecture

October 07, 2021

Machine learning in the wild

September 24, 2021

Delivering innovation at scale

September 09, 2021

Jim Highsmith: a 54-year agile journey

August 26, 2021

Securing the software supply chain

August 12, 2021

Making retrospectives effective — and fun

July 22, 2021

Patterns of distributed systems

July 08, 2021

Refactoring databases — or evolutionary database design

June 24, 2021

Making developer effectiveness a reality

June 10, 2021

Team topologies and effective software delivery

May 20, 2021

How green is your cloud?

May 07, 2021

Green software engineering

April 22, 2021

Twenty years of agile

April 08, 2021

Talking with tech leads with Pat Kua

March 25, 2021

My Thoughtworks Journey: Patricia Mandarino

March 11, 2021

Exploring infrastructure as code

February 25, 2021

XR in the enterprise

February 11, 2021

Getting to grips with data visualization

January 21, 2021

Computational notebooks: the benefits and pitfalls

January 07, 2021

The architect elevator

December 24, 2020

The future of Clojure

December 10, 2020

The future of digital trust

November 27, 2020

Integration challenges in an ERP-heavy world — Pt 2

November 12, 2020

Democratizing programming

October 28, 2020

Integration challenges in an ERP-heavy world

October 16, 2020

Models of open sourcing software

October 01, 2020

Applying software engineering practices to data science

September 17, 2020

Using visualization tools to understand large polyglot code bases

September 03, 2020

Machine learning in astrophysics

August 20, 2020

Programming languages geek out

August 06, 2020

Observability does not equal monitoring

July 23, 2020

Working with 50% of code in the browser

July 09, 2020

Realising the full potential of CD

June 25, 2020

Testing the user journey

June 12, 2020

Continuous delivery in the wild

June 01, 2020

Lessons from a remote Tech Radar

May 13, 2020

The future of Python

April 30, 2020

A sensible approach to multi-cloud

April 17, 2020

Digital transformation: a tech perspective

April 02, 2020

IT delivery in unusual circumstances

March 20, 2020

Continuous delivery for today's enterprise

March 06, 2020

Fundamentals of Software Architecture

February 21, 2020

Cloud migration — part two

February 10, 2020

The price of reuse

January 24, 2020

Towards self-serve infrastructure

January 13, 2020

Martin Fowler: my Thoughtworks journey

December 27, 2019

Building an autonomous drone

December 13, 2019

Cloud migration is a journey not a destination

November 28, 2019

Getting to grips with functional programming

November 14, 2019

Compliance as code

November 01, 2019

Data meshes: a distributed domain-oriented data platform

October 18, 2019

Edge — a guide to value-driven digital transformation

October 04, 2019

Tech choices: CIO or CTO?

September 20, 2019

Microservices as complex adaptive systems

September 05, 2019

Supporting the Citizen Developer

August 22, 2019

Getting hands-on with RESTful web services

August 08, 2019

Zhong Tai: innovation in enterprise platforms from China

July 25, 2019

What’s so cool about micro frontends?

July 11, 2019

Unravelling the monoglot monopoly

June 27, 2019

Breaking down the barriers to innovation

June 13, 2019

Delivering strategic architectural transformation

May 30, 2019

Exploring programming languages via paradigms vs labels

May 16, 2019

Multicloud in a regulated environment

May 03, 2019

Can DevSecOps help secure the enterprise?

April 18, 2019

A11Y — Making web accessibility easier

April 04, 2019

Continuous delivery for modern architectures

March 21, 2019

Delivering developer value through platform thinking

March 07, 2019

Architectural governance: rethinking the Department of ‘No’

February 21, 2019

Serendipitous Events

February 08, 2019

Diving into serverless architecture

January 24, 2019

Seismic Shifts

January 10, 2019

Understanding bias in algorithmic systems

December 28, 2018

Microservices: The State of the Art

December 14, 2018

Evolving Interactions

November 29, 2018

The state of API design

November 15, 2018

How we build the Tech Radar

November 01, 2018

IoT Hardware

October 18, 2018

Continuous Intelligence

October 04, 2018

Distributed systems antipatterns

September 13, 2018

Agile Data Science

August 23, 2018

Services

Industries

Resource Hubs

Publications and Tools

All Insights

Making privacy a first-class citizen in data science

Brief summary

Episode transcript

Find out what's happening at the frontiers of tech