Team topologies and effective software delivery

Podcast host Ashok Subramanian | Podcast guest Evan Bottcher with Matthew Skelton and Manuel Pais

May 20, 2021 | 53 min 52 sec

Listen on these platforms

Brief summary

We catch up with the two co-authors of Team Topologies: Organizing Business and Technology Teams for Fast Flow to hear about their ideas on enabling enterprises to become more effective at software delivery — and the influence of Conway’s Law, team cognitive load and responsive organization evolution.

Podcast transcript

Ashok Subramanian:

Hello everybody. And welcome to this edition of the Thoughtworks technology podcast. I'm Ashok, one of your regular co-hosts and I'm joined today by Evan.

Evan Bottcher:

Hi, it's Evan Bottcher here from Melbourne in Australia. I'm the head of engineering for Thoughtworks in Australia. To give a little background, I wrote an article a few years ago on platform teams that included the concept of platform teams. And I do a lot of consulting around engineering organization design and how to shake teams. So that's what I'm bringing to the conversation today.

Ashok Subramanian:

Brilliant and that seems to be quite relevant and topical given the two guests that we have today to discuss Team Topologies and who better to do that then the people who wrote the book themselves. So Matt and Manuel, would you introduce yourselves to our guests please? To our listeners, sorry.

Matthew Skelton:

Sure, hi, is Matthew Skelton here co-author of the book Team Topologies. I'm here with my co-author as well.

Manuel Pais:

Hi, I'm Manuel. I'm like Matthew said, the other co-author and yeah, I do similar work to what Evan mentioned, helping organizations understand their sort of team structures and how to achieve faster flow.

Ashok Subramanian:

So maybe we just start with something just to give our listeners, for people maybe who might have heard of Team Topologies and maybe even have sort of skimmed it, a brief sort of overview of the book and sort of what sort of prompted it in first place.

Matthew Skelton:

That's a great question. So, what I realized the other day, was effectively, our book is... You can see it as a 215 page rant against a huge number of things that are wrong with software engineering in the kind of modern world. Because this is born out of our own experience working with organizations and seeing the frustration inside organizations and both on the engineering level, manager level, business level of things, not kind of working well. We've been in situations ourselves where we've experienced a lot of the pain that we're trying to deal with in the book. And that's why the book ends up being really quite practical and opinionated and quite directed or directive about how we think things should be arranged in order to make building and running software systems more effective in the current context. So that was our sort of aim in writing it, is to set forward that this kind of set of patterns that should, we think, help and help organizations to become more effective at software delivery.

Manuel Pais:

Yeah, it was also based on our experience. There are different inputs to the book. The main one was our experienced consulting and seeing problems that different organizations were having in terms of trying to adopt DevOps, continuous delivery, but then they didn't see the results. And then when you go a bit deeper, you see that they often have problems with misunderstandings between teams, lack of communication, the sort of people problems that we know are harder to solve. There's not just, having a new tool set, which okay, bring some benefits. But at the end of the day, if teams are not communicating or if they're over-communicating, so those kinds of problems. So the book helped get that message across on some of the patterns we saw working well and what didn't work so well. And fortunately, a lot of organizations have seen this as sort of new vocabulary to talk about teams, purpose, and how to evolve the interactions between them.

Matthew Skelton:

Now, interestingly, Manuel and I both have a substantial amount of experience working in, what's now known as CI/CD. So that... This kind of the center of the software delivery world, which is the diploma pipeline, obviously that's sort of popularized by the great book by Jez Humble and Dave Farley, back in 2010. And we both did a lot of work around continuous delivery and thinking about how organizations are set up. And actually, I mean, continuous delivery, the concepts in there are absolutely foundational and even since well, since the book was published more recently, I've realized how central it is to get those concepts embedded. That kind of... The idea of an end to end flow of value in the organization using a deployment pipeline is... Without that, then we can't have a fast flow change.

A lot of the stuff in Team Topologies sort of, it's not going to happen because we need the solid engineering practices underneath. And so our experience working at that kind of... If you like the heart of the organization, the deployment pipeline level, I think shaped our thinking significantly around what the foundations are for an organization that can have a fast flow of change in the software space.

Some other things, some of the influences on the book, I actually did a blog post back in 2013, looking at different kinds of organizations, different team responsibility boundaries. Back then we called it sort of DevOps. I mean, DevOps now means something different, but back then, DevOps was all kind of about thinking about the relationship between development teams and operations teams as they were called then.

And that blog post got lots of traction, lots of interest in the diagrams. They're like Venn diagrams with different colors representing different kinds of teams. And so Manuel and I then kind of turned that into a website called DevOps Topologies, which is still there all up and open source, well creative commons. And that got even further traction and helps organizations, including Netflix and Conde Nast international to think about the relationships between different teams and boundaries and different dynamics basically, different trade-offs for different responsibility boundaries. And that led us to more conversations and eventually we decided to go well beyond that original Venn diagram, static view of the world, and incorporate many other dimensions, which are actually important things like Conway's law. So there's sociotechnical mirroring between the communication paths in the organization and the likely software architecture that typically results or system architecture, the results, but also things like team cognitive load, which is a term that we've effectively sort of invented or shaped in the book. Applying content load at a team level is a very new concept.

And then making sure that the organizations are evolvable partly by introducing constraints on the types of teams and the types of interaction between teams and effectively that combination, those constraints end up actually being what we call enabling constraints in a complex adaptive system. We're reducing the degrees of freedom that this organization can use precisely so that we've got a better ability to listen to the signals that are happening inside the organization. So, that's where we ended up. It was a journey of probably, well, at least ideas, I guess, certainly from... Inspired by the continuous delivery book and a bunch of other things

Evan Bottcher:

I've got to say that in Thoughtworks, in our consulting work, it's actually been quite impactful and my... And the creative constraint around the limited number of patterns and the pattern of language has been really, really powerful. The work that we've done, and I know I've done, many of us at the time when we've picked up the book have said, you have about three half finished slide decks that we've done with clients to describe many of these patterns that have been emergent in organizations that we work with. But you're essentially doing that work to publish and explain these with them in such an articulate way has been really, really helpful.

And some of them have been counter to a lot of the trends in the last few years of long lived product aligned teams, your stream-aligned teams, to introduce some alternative patterns that have been really great conversation starters, and actually sort of shattered some defaults that people have gone to and some dysfunction that's emerged as part of it. I wonder how much... When you were putting together the material, how much of it was through observation of existing patterns in teams and how much of it was your ideation of this is a recommended way forward?

Matthew Skelton:

It's a combination of observation, what works. A bit of ideation. But also starting with some principles, starting with fundamentally the principle of really three things. If you want to summarize Team Topologies, it's optimized for fast flow, rapid feedback and limits on team cognitive load. And if you start with those three principles, I mean, the Conway's law stuff is kind of useful, but is not really absolutely at the heart of it really in some respects. If you look for organization designs that optimize for fast flow of change toward live systems, rapid feedback from those live systems so that teams can course correct and limits on team cognitive load, then you will end up with a design, with a set of principles and practices that look like Team Topologies. So that's fundamentally what you would get.

You might have some slight differences. You might have five different types of team, or you might have four different interaction modes, fine, but if you start with those principles, then I would find it very difficult to see how you get something different. Because you want to avoid hand-offs for a fast flow, you don't want to have lots of waiting around for people to do different things. And you want an end-to-end responsibility, so you're very close to the customer, whoever the customer is, but increasingly you can't expect teams to just take on more and more and more and more stuff. So you need to have the cognitive load limited. Therefore, you need to have something like a platform. You might call it something different, but you're going to have to have some of that stack taken away from teams to help them focus on the stuff that's more germane and relevant. Effectively, we start with those principles and then looked and saw what different organizations were doing.

And effectively, looking back what we ended up doing sort of is reverse engineering what some organizations were doing. So Amazon, AWS is a classic example, whether or not it's true that the kind of that the memo from Jeff Bezos back in, whenever it was, 2002 saying, each team will communicate with other teams via clear API and treat other teams as if they were external. All services will be externalize-able. This kind of two pizza thing. A lot of people miss the point of the two pizza team, it's nothing about food or what have you. It's about a high trust and therefore a high degree of contextual awareness about changes, ability to make those changes very rapidly and course correct. And crucially where we're limiting the cognitive load on the teams by keeping them small. We cannot take on more and more and more stuff in the system. Therefore, we're going to have to compose our total solution out of a smaller number of discrete and well, nicely decoupled discrete services, which is kind of good software engineering practice. Right? Good engineering practice in general.

So we effectively did some reverse engineering on what was already out there and kind of worked out. Well, actually, what's really going on at Amazon. Yeah. Two pizza teams sound all nice and cutesy, but underneath that are hole of really key principles that actually are fundamental to building scalable software systems. And so we tried to unpack them from other places too, and looked at, for example, what Spotify were doing and, and what was the intent behind the Spotify model as people started to call it. What was actually going on there that's not just the names of the teams and the kind of roles and that kind of stuff? And so that's what we tried to capture in Team Topologies is this kind of reverse engineering of patterns that seem to work well, or seems to have had some value at a particular point in time, in certain contexts and try and work out what problems they were trying to solve or what they were trying to solve for, what the principles behind those, the external patterns and reverse engineer that, and then kind of characterize in a way which was forward looking.

Manuel Pais:

At the end of the day, Team Topologies is, as Matthew mentioned earlier, is really focused on if you want your organization to be able to respond faster, deliver faster, and also have healthy teams that are long lived and that are sustainable. And so sometimes people ask us what if an organization doesn't have those kinds of requirements? And that's okay, then maybe the Team Topologies is not the ideal approach for them. We just don't see many organizations these days that can afford not to go too fast and adapt quickly as things change so fast.

But even Team Topologies at the end of these is sort of a trade-off you have to make where if you want to introduce the ideas of Team Topologies, the constraints, the cavalry, and the understanding of why these constraints are useful, that's a bit of effort you have to invest so that you can then look at achieving fast flow and in particular stream-aligned teams. Yes, those kinds of long-lived teams cross functional with end-to-end responsibility are hard to achieve, and they need to be kind of grown over time. It's not like we're going to have... Create a new model today and tomorrow, we have streamlined teams that are high-performing.

That's not how it's going to work. You need to invest in these teams. You need to look at what kind of support they're going to need in terms of platform enabling teams, other things in terms of engineering practices but also funding practices, et cetera, which is a little bit outside the scope of team topologies. But it has to be an investment in aligning to these ideas, these principles that Matthew was talking about. If you want to achieve faster flow, being able to adapt quickly and have healthy teams. And that might not be a trade-off that every organization wants to make, but we're not saying that everything about team topologies is going to be easy to adopt and make things much better, which is the fallacy, not because of who wrote the Spotify article or something like that. They were just explaining how this approach was helping Spotify at the time. But then when people copy that and try to just replicate in the organization without understanding why are we doing this and what's different and how do we adapt this to our needs, then it becomes a problem often with a lot of cargo culting and keywords that pop up and people want to follow without the benefits.

Matthew Skelton:

That was one reason why we wanted to avoid these specific roles, a bit like, I don't know, from scrum, you got scrum master and what-not. You might have these particular individual roles. We avoid that in Team Topologies precisely because we really wants to avoid yet another kind of, I don't know, industry certification drive for people to jump on this bandwagon and say, "Well, I'm a certified this, and therefore, we're now officially Agile version seven," or whatever it is. So we took some care to try to avoid that anti-pattern really and to try to emphasize the need for organizations to internalize the principles, fast flow, rapid feedback, limiting cognitive load on teams.

Some other principles include high trust, looking at the size of groups inside the organization and try to maximize the trust in there, because if we've got high trust, we can have a faster flow of change because we don't get the distrust at least to people stopping things or wanting to approve or inspect things. But by coming back to the key principles all the time, then we hope that there's more chance that organizations can actually continuously adapt and evolve rather than just latching onto if you have a nice fixed set of teams that look like this, then all your problems will be solved. That's what we try to do, anyway.

Evan Bottcher:

I think that that transition, Manuel, you're talking to that organizations that try to adopt an approach. And I've been involved in organizations where they've tried to move towards product-aligned, long-lived team structures, but the underlying architecture does not match. And so the cognitive dissonance, I guess, that they bump into when they say, "No, but we want to have teams with independence and autonomy and ability to respond to a customer need," but actually the changes to the systems underneath, they don't. We coined this years ago in the Technology Radar, the Inverse Conway Maneuver. It's a very painful process for many organizations as they realign into a new structure and the architecture underneath has to tear itself apart in order to create that. And it takes time and causes a lot of pain, can be quite risky for some organizations.

But one of the things that appeals to me around the Team Topologies patterns is this concept of the complex subsystem, which does give us a language to describe some of the systems are not safe yet to take change as a self-service way or to be placed into the control of the streamlined teams. And so they may need some different treatment. A question that came up, I think I saw a webinar recently of yourself, talking to this, Matthew, was this complex subsystem team, the distinction between that and a platform or whether there's an evolution over time between those things. Do they naturally fall into some more self-service platform? It's interesting.

Matthew Skelton:

So part of the challenge here is that we're seeing a fundamental shift in organizational capabilities based on modern software, particularly cloud software, but cloud inspired software we can say, things like infrastructure as code, stuff like this. Going back 20 years, making changes to how your business runs or the kind of services you provide, it took some time. Now, potentially, particularly if what you provide has an online digital element to it, maybe all of it is that... But let's say insurance back in the day 20 years ago, someone would maybe visit a website, but probably just telephone or sends an email or something. Now it's all self-service via the customer sales. They can log in. They can change their insurance details. They can request a new quote. And a lot of the calculation of that new insurance premium is done automatically or semi-automatically. It's digitally enabled.

We're not saying every organization is a software company. I think that's misguided. But every organization now that wants to maintain a competitive or some differentiation can use software to do so. And so we're able to make changes that are much more rapid. What this is showing up is lots of organizations often don't even have a clear picture of what they provide all their purpose. And approaches like team topologies, approaches like domain-driven design, approaches like data mesh highlighting that lots of organizations have a huge lack of clarity in what they actually do and what their purposes. And so it's no wonder it's difficult to align to the business purpose because there isn't a business purpose or it's a very mushy or very ill-defined business purpose.

And in the past, that wasn't less of a problem because things moved so slowly. You have these systems that were, I don't know, owned or managed in a way, which needed big change to happen one after another and then multiple different people coordinating that change in order to get a big change out the door. Now, partly because of technology but partly because of other techniques, we're able to make changes much more rapidly. And therefore, the organizations that are succeeding are those that have aligned their technology architecture to the streams of business change or streams of organizational change that are needed. And yeah, the approaches like team topologies are highlighting this mismatch basically between the organizational business practices of the past and the ones that are possible with new techniques and new technologies.

Manuel Pais:

Yeah. And you mentioned the Inverse Conway Maneuver or Reverse Conway Maneuver, which was, I think, to a large extent promoted by people like James Lewis from Thoughtworks. To actually do it right, I think there are many aspects to consider. One is I think what Matthew just said around if you don't understand what are your actual business streams of value, then you're not going to be able to match the architecture even to those streams of value because things that are going to be coupled and you don't have a good understanding of what are the architecture services, if you like, that you want to decouple. And then there are other layers even of coupling where you might even have an architecture that's looks more or less decoupled, but then teams still depend on each other because they share infrastructure or they share tooling or they are interlocked with some processes in the organization. And so we still don't get the autonomy and more independent teams working on different services because of that. So there are several layers, but starting, obviously, with what Matthew said, if you don't have clarity on what are your business streams, how can you even align either the system architecture or the teams to those streams?

And going back to your question about complicated subsystems and platforms, so actually we call them complicated subsystem because we want to avoid the complex terminology because of [inaudible 00:23:39] and the fact that's usually in situations where you have basically emergent things and properties that you cannot really control. But anyway, so for a complicated subsystem team, the starting point is that it would be too much cognitive load that a streamlined team would be responsible for this complicated part of the system. So that's a starting point, which means you might have one complicated subsystem which is only being used by one streamline team, and that still makes sense because of the reducing cognitive load on the streamline team. What we see in terms of evolution in general as the complicated subsystem, it might be in a constant flux of changes, but those systems at some point, the cadence of change starts to slow down. And as the technology evolves, in many cases you see better solutions, more third-party solutions.

For example, we can talk about I worked with systems that use face recognition subsystems, which was, in fact, complicated. You needed a PhD to work on that, but these days you have good solutions out there. So you could at some point expect to move the complicated of system with some changes into a platform service, where you rely more on the third party perhaps or it's just has a very slow cadence of change. It doesn't justify having a team around it. If we have the right documentation support, then it can become part of a platform. In fact, we expect a complicated subsystem team to adopt very similar behaviors to platform teams in the way they provide that service or subsystem to other teams.

Matthew Skelton:

Yeah, we didn't put it in the book, but we've since realized a complicated subsystem is like a mini platform. A lot of behaviors are very similar. We don't use exactly that terminology, but effectively that's how you should see it.

Manuel Pais:

It's one of the challenges we've seen now that Team Topologies has been out for about a year and a half and more organizations are adopting. We see some organizations that think they need more complicated subsystem teams than they actually do. And that's, I think, related to the fact that this seemed more similar to the traditional component teams, which when you look at fast flow are typically a bit of a problem, causing bottleneck when you have many teams that depend on the same component team. So that's one of the challenges, understanding we should limit as much as possible the number of complicated subsystems and try to move them into the platform when possible.

Evan Bottcher:

Yeah, there's no real limit to the people who can read some insight and then apply their own world view to it, and then say, "Okay, yes, we've got 1,700 complicated subsystems that are actually just component teams each working in a release train across the whole organization."

Manuel Pais:

Yeah. It goes back to what we were talking earlier around, trying to copy/paste these models without actually getting the underlying principles and ideas and why are we adopting this model.

Ashok Subramanian:

I think just too as a followup on the complicated subsystem, I think one of the challenges we see quite often is... And maybe this also mirrors to some of the points Matthew and Manuel were making about the purpose of an organization. A lot of time, I think we see when you go and try and identify what the "business," business obviously in double quotes, is trying to do or achieve is you sometimes end up getting pointed towards you need to go and ask the team that manages system X or whatever.

And really, it almost has the flip of the technology driving or defining what needs to be done. And in those sort of situations where systems tend to have become quite large or they almost are like they tend to define and control almost what's happening within the organization. The transition of those complicated subsystems, especially in that transition into what might ultimately become multiple streamline teams, in that transition, what are the things you would suggest to people consider or look out for? Or what might that transition potentially look like?

Matthew Skelton:

That's a nice question. Before we wrote the book, we actually did some work for an organization, a global organization, but this particular part of it was based in the UK. And Manuel and I were working on getting continuous delivery in place, so the Plum pipelines and stuff like that. But there was a huge monolithic system in place already, and there was some really interesting discussions to watch at the time to look at the dynamic of different groups in the organization effectively positioning around this technology.

Manuel Pais:

And also culturally [crosstalk 00:29:46].

Matthew Skelton:

Oh, yeah. Culturally, for sure.

Manuel Pais:

In different countries, you need to be mindful of different. Even the national culture is different. And I think that had an impact in that organization as well.

Matthew Skelton:

You've got to be careful about some of the assumptions you might make. So in this particular case, we actually started off assuming that we would actually be able to get some aspects of this monolithic system under something that looked a bit like continuous delivery with some automated tests. It turns out that this particular system, the system itself had been sold from one multinational technology provider to another about four or five times, which indicates probably something about its suitability in general. But anyway, turns out it didn't have a proper way to test it, and all of the logic resided in database tables, which made it extremely hard to test and tease apart. And so then there's actually a really good indication that you actually need to have a proper low level understanding of the technology sometimes in order to be able to work out what's the scale and size of this problem. You can't just immediately assume, actually, that you can apply some of the patterns.

And that's a really good example of where the technology was driving the kinds of teams that were needed in place, precisely because you... It took 40 minutes to start a developer environment for the system, 40 minutes on a virtual machine, on a massive virtual machine in the cloud. I think they managed to get the build time for the system down from something like 48 hours to 21. They thought that was great just to build a new version of the software for the system. And at the time, the people there were like, "Oh, wow, this is amazing. We've more than halved the..." Well, yeah, but that's still way too long. I think eventually they got it down to something like four hours. But again, that's still [crosstalk 00:31:46] expect.

Manuel Pais:

I think to have a proper live environment that could scale and cope with demand, it took two weeks just to set up the infrastructure.

Matthew Skelton:

Yeah. Maybe these are edge cases, maybe not. But there's definitely some situations where you certainly don't want to over promise what you can achieve. And maybe some in some of these situations, the right thing to do is to hide some of that awkwardness behind some interface to allow you to innovate around the edge or on top and then focus on maybe getting some things in place where you're providing APIs into this older, more difficult system. This is all standard stuff with mainframes and things like that. But certainly, take the time to do the proper technical deep dive into the potential limitations of whichever system you're working with.

In fact, this particular system was more limited than mainframes because at least on mainframes you you've got the ability to do virtualization and some modern testing tools. This particular thing was a real mess.

So yeah, don't over promise. Don't expect you can just rush in there and retrofit something that's designed for literally the two words that are on the cover of the book, fast flow. That's what it's designed for. We should not expect to be able to retrofit that onto a system that was designed for something else, designed for, immediate consistency, for example. Then that's a single relational database or a single logical relational databases. If they've optimized for immediate consistency, fine. It's going to feel very different from something where eventual consistency and separate streams of stuff is what we're aiming for.

Manuel Pais:

So in terms of that transition to streamline teams, you feel like, as we were saying before, the first step is starting understanding of what those streams actually are. And that's an evolution as well. In this example that Matthew was giving us this organization it's not about stopping everything and saying, "Okay. What we need is to," let's say, "adopt DevOps because it's going to solve all this problems. And now we're going to change or we're going to adopt Tribes or we're going to adopt whatever is the recent trend," and expect that to solve our problems when, in fact, we need to start with looking at what are the streams and start evolving from where you are towards a better place or a better fit for faster flow.

But it's not a revolution. It's not a big reorg that typically is going to achieve that. Sometimes it's going to be sort of painful process where we're starting to figure out, like in this example, there are technical issues that prevent us from achieving faster flow. And then there are team organization issues. In that example, there were some behavioral issues of people who are used to work in a certain way and don't see themselves as being bound to internal or external customers. So they just want to do their work as they've always known. So anyway, there are a number of factors, behavioral, technical and organizational, that means it's going to be a journey to get to better place. And so when we were helping clients and their teams, what we're trying to do is not let's create a new model with only streamline teams or mostly streamline teams and the platform, et cetera. What we're doing is understand where are you now, which teams do you have and then start to help those teams figure out, are we more of a streamlined team? And if yes, what are we lacking or are we missing to be more in line with the ideas of a streamlined team if that's what we want to be?

And so often then we get into the responsibilities and the capabilities that the team has and people like John Cutler, for example, who does a lot of great work and posts around product development but also the team dynamics aspects. So he has some really good stuff around the domains of responsibility for streamline teams, where you might have teams that basically just do build and test. Then you have teams that actually are able to support their own service in production. But then you still have a lot of teams that have very little input into the actual product development or service development, understanding the customers, what do the customers need and then being able to experiment. You start seeing this range of responsibilities that you would expect from a real autonomous streamline team or self-sufficient streamline team. And so it's not going to be a one-step changes. But we can look at teams today, and with this patterns in place and this typologies, we can start looking at what's missing and taking steps towards that.

Evan Bottcher:

I've noticed in the last few years, there's been such a big emphasis, and I know myself it's been a big part of my consulting, has been around organizations that want to, to introduce some internal platform to be able to go faster, reduce that burden. A beautiful description in the book around cognitive load is a way of describing what you're trying to reduce to enable teams to move faster. I wonder if we could explore that a little bit and particularly around the concept of the importance of product thinking when it comes to building that platform.

Matthew Skelton:

For us, it was a real light bulb moment when we started thinking about how this idea of cognitive load would apply to whole teams. And it resonated with me things that we'd effectively been seeding for many years anyway. But applied to a team level, it suddenly starts to make a little more sense. So we talked about the concept of team cognitive load in the book because ultimately what we want teams to focus on is the thing that is important for the organization. If you're an organization that's working in banking, then a lot of teams will be focused on banking because that's the thing which is a differentiator. Your team is primarily focused on the differentiating aspects of the work because particularly the pace of technology change now, there's so much stuff which is coming along so quickly from external providers, cloud providers, whatever that if we're working on things which are non-differentiating, there's a real risk that we're just going to get left behind.

And so we've got this idea strongly that we need to be focused on the things that are differentiating for the organization at this point and try to minimize the cognitive load on teams for things that are extraneous to that. And this idea of minimizing extraneous team cognitive load for streamlined teams is what drives the other three team types, really. Why we've got these other three team types, enabling team, complicated subsystem team and platform team, is primarily to reduce cognitive load on streamline teams. And that's it. In the past, an internal platform might've been there to share hardware or to share license costs or something like this. And that's a reasonable design decision back in the day when hardware was extremely expensive and scarce and when you only had one provider to choose from if you wanted to relational database and so on. So it's not that that stuff is inherently bad. It's just that the dynamics of business landscape have changed. Like we said before, we're trying to optimize for a fast flow change.

If you've got a streamlined team that wants to go very, very quickly in a particular area, they have the choice to work on the business domain relevant things. Let's say it's financial transactions. If they want to go really quickly, then you could allow them to build their own infrastructure. If you want to allow them to go very, very, very, very quickly, maybe they need to create their own special database. Maybe there's nothing currently in the market that suits them. And the right thing to do at this point in time is to create their own database. Fine. If that helps him to go quickly, safely and deliver customer value, then let them do it if the business case is there and if the organization is willing to pay for it. And if that gives them the autonomy and the speed and the safety, then why wouldn't you do it if the organization has got the money?

At some point, that team is going to be dealing with financial transactions for consumers and building a lot of infrastructure and maintaining this custom database. At some point, that cognitive load is going to get in the way, way of delivery. So at that point, you've got a choice. What do you do? Do you change the skills mix? Do you send people on training? Do you switch the technologies to make them easier, move up the stack, if you like, to a simpler language or whatever? Or do you do something like move some of that stuff into a platform? But all of the thinking around platforms really should be driven by this combination of fast flow, rapid feedback and limiting cognitive load. From our point of view, that should be the driver. That should be the thing that tells you what the right boundaries are or that informs what those right boundaries should be, not just, oh, we definitely need a platform because this stuff is infrastructure. I don't think that's true. It's entirely reasonable for a streamlined team that's focused on delivering pizzas or whatever to manage his own infrastructure if that's the right thing to do for speed and safety.

But you've got to balance that with the kind of available cognitive load, or the available content capacity based on, do they have enough time to think about the main business domain? And if managing that infrastructure means they don't have enough time to focus on selling pizzas, then you've got a problem, and you want to take some of that responsibility away. So it's this trade off, right? So, it's an engineering trade-off and from our point of view, that's what really should drive the thinking, particularly around platforms.

So conversely, the starting point for platform shouldn't be, whatever the platform does, however it does it, it should not increase content load on the teams that are using it. That's, for us a really important starting point. That should be kind of the North Star for the platform; are we increasing the cognitive load on teams that are using our platform? And if we are, we're doing it wrong. And then we can start looking at applying things like product management techniques and self-service, and all these other things as well, which is super important, but if we apply all of these great techniques, and we're increasing cognitive load on teams using the platform, we're doing it wrong.

Manuel Pais:

Yeah. Very simple, I guess, straightforward example that maybe many listeners will relate to, or not. But, if your idea of platform is for example, we're providing coordinators to our internal teams, and what we do is create some clusters and then tell teams to use them and deploy their services, then that is a very likely is increasing, and probably by a lot, because now you're asking the development teams or streamlined teams, if you like, to just have to learn a whole load of new information about this new platform. And, yes, there's great Kubernetes documentation, but if you're asking teams to read that, to be able to do the things that they were doing before in a totally new platform, that's a massive increase in cognitive load.

So I would say platform teams kind of should have almost, I say this in the training that we do for platform engineering is, almost have a kind of startup mentality where yes, first of all, you're providing some technology and some new ways of doing things, or better ways, which is kind of starting point, but then how do we make this easy to consume, easy to understand, and also how do we actually provide even more value internally to our internal teams, by understanding what are their specific needs? And how can we, not just provide the technology access, but actually provide the good abstraction layers, and the differentiators that Matthew was talking about, for our platform as a product?

And in your definition, Evan, that we reference all the time about, "What constitutes a modern digital platform?" You talk about it, it should be a compelling internal product, right? So I think that definition is, is really good because compelling means, okay, we see did the customers of platform see the value they see, "Okay, this makes my life easier because now I don't have to understand all these details about Kubernetes deployments or what have you, you have provided me a better abstraction, a higher level obstruction so that I can just focus on the things I need to do for my product or for my business stream and focus on that. And I know that with a simple configuration file or something, I can get the service out into the live environments."

And obviously that applies to all sorts of other kind of platform services you might have. So think of things like, what is the value proposition of the platform and its different services, and what is the actual value we're providing internally? And how does this differentiate from the development teams directly using some third-party external option? So we have that internal context of what our teams need, or at least we can talk to them directly and understand what they need, that should give us a very strong, competitive advantage, right? Compared to external organizations. But we don't always see that being leveraged adequately. So many platform teams think, "Well, we're just making technology available internally." And that's not really devalue, and it's not necessarily reducing cognitive load.

And finally, if you have that startup approach or that product-driven approach, then you should be very aware of the adoption cycle, right? So the adoption cycle for products also should apply internally to platform services, where you have early adopters; people who on teams who are more kind of engaged and willing to take on, rough edges of the initial versions of the service, and they're willing to provide feedback, and you'll have early majority and late majority. And so, you have to understand also, or have at least an idea of where is our service position in the adoption life cycle internally, so that we might need to do different things. If we're now targeting the late majority, then we need to understand why are they not adopting these surveys? What are the kind of frictions that they might have, or are we missing out on some workflows that are important for certain teams that are not supported by the service and so on? So it's, I think it's crucial to, to have that mindset of the platform as a product.

Matthew Skelton:

One phrase that we came up with after publishing the book, sadly, we wish in retrospect, it would've been great to have this phrase in the book is, the platform is a curated experience for engineers. And that's the starting point. The starting point is not technology, the starting point is, what's the experience for people using it? Does it increase their cognitive load? Does it make their lives easier? Does it help them to focus on their main area? And curating the experience then starts to get us thinking about product, and design thinking, and user experience, and all of these kind of areas that are relatively well understood now in 2021, when I'm recording this. Things that are well understood, particularly in terms of kind of consumer software services. So we're taking that rich body of awareness and practice and applying it to an internal platform.

So, it's not a straightforward translation, but the key thing there is that there's a whole number of people in the industry who have got these skills already. What we're doing is applying it to internal customers; the streamline teams, software development teams, engineers, testers, and so on, and applying to kind of this internal platform product. So we do not have to invent a whole bunch of new techniques and things, we can sort of reuse and adapt to the existing team, techniques that have already been proven. I mean, proven at scale by cloud companies that are valued at billions and billions of dollars, they've used these techniques, and proven them to be at least successful in the wider market. And so, starting with that sense of it, of needing to curate an experience for our customers, the internal teams, is, I think a really, again, this kind of North Star, a good way to guide the decisions we take about a modern platform.

Manuel Pais:

Well, we see still many clients where they're sometimes don't have even product management mindset, or even the role inside the platform teams. And so, with some clients who have actually recommended not just to start having product management roles in the platform, but actually your most experienced product managers should be in those teams, because that's where there's the least awareness. So maybe you're in your development teams or streamlined teams, they are more kind of common to find it this more focused on product development and user experience, so on. But actually it's in the platform where you probably needed the most experienced people to be helping these teams kind of get sort of different ways of working, and looking at who are the customers of this platform.

Evan Bottcher:

Well, I'm sure that we could explore it. There's a whole book and a bunch of additional material, and lessons to be leaned here. I think we should start to wrap this discussion up. Where would you suggest that listeners go to next in order to find out more? It's obviously the book, but additional materials that maybe you're putting together?

Manuel Pais:

Yes. So on our website, teamtopologies.com, which is where we had new industry examples. We recently published some infographics that have been quite well received, on kind of getting started with Team Topologies, and Team Topologies in a nutshell, which helped people at least start talking about this in the organization, understand the why of Team Topologies. And we're also have launched a Team Topologies Academy, which the first course that we have is for sort of distilled version of Team Topologies book for people who want to, with a couple of hours, they can start getting the vocabulary, and the basic understanding of the principles and the patterns in Team Topologies. And that is quite useful when organizations want to adopt some of these ideas, and they want to have kind of a shared understanding across all of their staff. So yeah, all those resources, we also have a number of repositories on GitHub. So github.com/teamtopologies, or you can go to teamtopologies.com/tools, and basically a set of templates, and assessments, and useful techniques that we use ourselves with clients when we're helping them kind of understand that, and start evolving towards a faster flow.

Ashok Subramanian:

That's great. I've looked at the website, there's definitely a bunch of great resources that are including, I think the templates that you've referred to Manuel, we've used. I've used some of that myself. So you're definitely worth a look. I know, we could go on for, I'm sure another hour in a bit quite easily with the questions, but I would say, to me, I think some of the main sort of takeaways, if it was from this was, while the patterns that you've identified, I think what I found even more sort of interesting or illuminating was the principles behind it. And the three things that you sort of called out were around making sure anything that people are doing is focused towards fast flow, rapid feedback and limiting the cognitive load, right? I think that they're definitely things I think they should for our listeners, as well, if there's something that you want to sort of take away from this, make sure that you're focusing on those in terms of any evolution towards team structures and so on.

Thank you so much to, to Evan, Manuel and Matt, I think it's been a fascinating discussion, and hope our listeners have enjoyed listening to this episode of the Thoughtworks Technology podcast.

View full transcript

View less

More episodes

Episode name

Published

Themes from Technology Radar Vol.31

October 17, 2024

Build Your Own Radar: Using the Technology Radar as a governance tool

October 03, 2024

Exploring DuckDB: A relational database built for online analytical processing

September 19, 2024

Software service granularity: Getting it right

September 05, 2024

Measuring developer experience

August 22, 2024

How can AI support designers?

August 08, 2024

Sensible defaults: A way to think about our technology practices

July 25, 2024

Tracking technology stacks, practices and experiences across teams

July 11, 2024

Inside Bahmni: An open-source digital public good

June 27, 2024

How to assess your organization's security maturity

June 13, 2024

Continuous delivery vs. continuous deployment: What should be the default?

May 30, 2024

Themes from Technology Radar Vol.30

May 16, 2024

Building at the intersection of machine learning and software engineering

May 02, 2024

Refactoring with AI

April 18, 2024

How to measure your cloud carbon footprint

April 04, 2024

Technology through the Looking Glass: Preparing for 2024 and beyond

March 21, 2024

Diving head first into software architecture

March 07, 2024

Exploring the building blocks of distributed systems

February 22, 2024

Software-defined vehicles: The future of the automotive industry?

February 08, 2024

Beyond the DORA metrics: Measuring engineering excellence

January 25, 2024

Asynchronous collaboration: Getting it right

January 11, 2024

Looking back at key themes across technology in 2023

December 28, 2023

Leveraging generative AI at Bosch

December 14, 2023

Jugalbandi: Building with AI for social impact

November 30, 2023

AI-assisted coding: Experiences and perspectives

November 16, 2023

What's it like to maintain an award-winning open source tool?

November 02, 2023

Engineering platforms and golden paths: Building better developer experiences

October 19, 2023

Managing cost efficiency at scale-ups

October 03, 2023

Exploring SQL and ETL

September 21, 2023

Driving innovation in radio astronomy

September 07, 2023

XR with impact: Building experiences that drive business value

August 24, 2023

Leadership styles in technology teams

August 10, 2023

Making design matter in technology organizations

July 27, 2023

Generative AI and the future of knowledge work

July 13, 2023

Scaling mobile delivery

June 29, 2023

Making privacy a first-class citizen in data science

June 15, 2023

Multi-cloud: Exploring the challenges and opportunities

June 01, 2023

Scaling up at Etsy

May 18, 2023

TinyML: Bringing machine learning to the edge

May 04, 2023

The weaponization of complexity

April 20, 2023

How we put together the Technology Radar

April 06, 2023

Inside India's Drug Discovery Hackathon

March 23, 2023

Serverless in 2023

March 09, 2023

My Thoughtworks journey: Rebecca Parsons

February 23, 2023

How to tackle friction between product and engineering in scale-ups

February 09, 2023

6 key technology trends for 2023

January 26, 2023

Tackling system complexity with domain-driven design

January 12, 2023

Shifting left on accessibility

December 29, 2022

Data Mesh revisited

December 15, 2022

Low-code/no-code platforms: The 10% trap and the limits of abstractions

December 01, 2022

Welcome to the fediverse: Exploring Mastodon, ActivityPub and beyond [Special]

November 24, 2022

Rethinking software governance: Reflecting on the second edition of Building Evolutionary Architectures

November 17, 2022

Reckoning with the force of Conway's Law

November 03, 2022

Exploring the Basal Cost of software

October 20, 2022

Why full-stack testing matters

October 05, 2022

Acknowledging and addressing technical debt in startups and scale-ups

September 22, 2022

XR in practice: the engineering challenges of extending reality

September 08, 2022

Agent-based modelling for epidemiology: EpiRust and BharatSim

August 19, 2022

Mastering architectural metrics

August 12, 2022

Building a culture of innovation

July 28, 2022

Starting out with sensible default practices

July 14, 2022

Better testing through mutations

June 30, 2022

Patterns of legacy displacement — Part two

June 16, 2022

Patterns of legacy displacement — Part one

June 02, 2022

Mitigating cognitive bias when coding

May 19, 2022

Following an usual career path: from dev to CEO

May 05, 2022

Software engineering with Dave Farley

April 21, 2022

Tackling bottlenecks at scale-ups

April 07, 2022

Coding lessons from the pandemic

March 24, 2022

Is there ever a good time for a code freeze?

March 10, 2022

Navigating the perils of multicloud

February 25, 2022

Compliance as a product

February 10, 2022

The big five tech trends for 2022

January 27, 2022

Fluent Python revisited

January 13, 2022

Creating a developer platform for a networked-enabled organization

December 30, 2021

The art of Lean inceptions

December 16, 2021

The hard parts of data architecture

December 02, 2021

TDD for today

November 18, 2021

You can't buy integration

November 04, 2021

The rise of NoSQL

October 21, 2021

The hard parts of software architecture

October 07, 2021

Machine learning in the wild

September 24, 2021

Delivering innovation at scale

September 09, 2021

Securing the software supply chain

August 12, 2021

Making retrospectives effective — and fun

July 22, 2021

Patterns of distributed systems

July 08, 2021

Refactoring databases — or evolutionary database design

June 24, 2021

Making developer effectiveness a reality

June 10, 2021

Team topologies and effective software delivery

May 20, 2021

How green is your cloud?

May 07, 2021

Green software engineering

April 22, 2021

Twenty years of agile

April 08, 2021

Talking with tech leads with Pat Kua

March 25, 2021

My Thoughtworks Journey: Patricia Mandarino

March 11, 2021

Exploring infrastructure as code

February 25, 2021

XR in the enterprise

February 11, 2021

Getting to grips with data visualization

January 21, 2021

Computational notebooks: the benefits and pitfalls

January 07, 2021

The architect elevator

December 24, 2020

The future of Clojure

December 10, 2020

The future of digital trust

November 27, 2020

Integration challenges in an ERP-heavy world — Pt 2

November 12, 2020

Democratizing programming

October 28, 2020

Integration challenges in an ERP-heavy world

October 16, 2020

Models of open sourcing software

October 01, 2020

Applying software engineering practices to data science

September 17, 2020

Using visualization tools to understand large polyglot code bases

September 03, 2020

Machine learning in astrophysics

August 20, 2020

Programming languages geek out

August 06, 2020

Observability does not equal monitoring

July 23, 2020

Working with 50% of code in the browser

July 09, 2020

Realising the full potential of CD

June 25, 2020

Testing the user journey

June 12, 2020

Continuous delivery in the wild

June 01, 2020

Lessons from a remote Tech Radar

May 13, 2020

The future of Python

April 30, 2020

A sensible approach to multi-cloud

April 17, 2020

Digital transformation: a tech perspective

April 02, 2020

IT delivery in unusual circumstances

March 20, 2020

Continuous delivery for today's enterprise

March 06, 2020

Fundamentals of Software Architecture

February 21, 2020

Cloud migration — part two

February 10, 2020

The price of reuse

January 24, 2020

Towards self-serve infrastructure

January 13, 2020

Martin Fowler: my Thoughtworks journey

December 27, 2019

Building an autonomous drone

December 13, 2019

Cloud migration is a journey not a destination

November 28, 2019

Getting to grips with functional programming

November 14, 2019

Compliance as code

November 01, 2019

Data meshes: a distributed domain-oriented data platform

October 18, 2019

Edge — a guide to value-driven digital transformation

October 04, 2019

Tech choices: CIO or CTO?

September 20, 2019

Microservices as complex adaptive systems

September 05, 2019

Supporting the Citizen Developer

August 22, 2019

Getting hands-on with RESTful web services

August 08, 2019

Zhong Tai: innovation in enterprise platforms from China

July 25, 2019

What’s so cool about micro frontends?

July 11, 2019

Unravelling the monoglot monopoly

June 27, 2019

Breaking down the barriers to innovation

June 13, 2019

Delivering strategic architectural transformation

May 30, 2019

Exploring programming languages via paradigms vs labels

May 16, 2019

Multicloud in a regulated environment

May 03, 2019

Can DevSecOps help secure the enterprise?

April 18, 2019

A11Y — Making web accessibility easier

April 04, 2019

Continuous delivery for modern architectures

March 21, 2019

Delivering developer value through platform thinking

March 07, 2019

Architectural governance: rethinking the Department of ‘No’

February 21, 2019

Serendipitous Events

February 08, 2019

Diving into serverless architecture

January 24, 2019

Seismic Shifts

January 10, 2019

Understanding bias in algorithmic systems

December 28, 2018

Microservices: The State of the Art

December 14, 2018

Evolving Interactions

November 29, 2018

The state of API design

November 15, 2018

How we build the Tech Radar

November 01, 2018

IoT Hardware

October 18, 2018

Continuous Intelligence

October 04, 2018

Distributed systems antipatterns

September 13, 2018

Agile Data Science

August 23, 2018

Services

Industries

Resource Hubs

Publications and Tools

All Insights

Team topologies and effective software delivery

Brief summary

Check out the latest edition of the Technology Radar