Master

Team topologies and effective software delivery

20 May, 2021 | 53 min 52 sec
Podcast Host Ashok Subramanian | Podcast Guest Evan Bottcher with Matthew Skelton and Manuel Pais
Listen on these platforms

Brief Summary

We catch up with the two co-authors of Team Topologies: Organizing Business and Technology Teams for Fast Flow to hear about their ideas on enabling enterprises to become more effective at software delivery — and the influence of Conway’s Law, team cognitive load and responsive organization evolution.

Podcast transcript


Ashok Subramanian:

Hello everybody. And welcome to this edition of the ThoughtWorks technology podcast. I'm Ashok, one of your regular co-hosts and I'm joined today by Evan.


Evan Bottcher:

Hi, it's Evan Bottcher here from Melbourne in Australia. I'm the head of engineering for ThoughtWorks in Australia. To give a little background, I wrote an article a few years ago on platform teams that included the concept of platform teams. And I do a lot of consulting around engineering organization design and how to shake teams. So that's what I'm bringing to the conversation today.


Ashok Subramanian:

Brilliant and that seems to be quite relevant and topical given the two guests that we have today to discuss Team Topologies and who better to do that then the people who wrote the book themselves. So Matt and Manuel, would you introduce yourselves to our guests please? To our listeners, sorry.


Matthew Skelton:

Sure, hi, is Matthew Skelton here co-author of the book Team Topologies. I'm here with my co-author as well.


Manuel Pais:

Hi, I'm Manuel. I'm like Matthew said, the other co-author and yeah, I do similar work to what Evan mentioned, helping organizations understand their sort of team structures and how to achieve faster flow.


Ashok Subramanian:

So maybe we just start with something just to give our listeners, for people maybe who might have heard of Team Topologies and maybe even have sort of skimmed it, a brief sort of overview of the book and sort of what sort of prompted it in first place.


Matthew Skelton:

That's a great question. So, what I realized the other day, was effectively, our book is... You can see it as a 215 page rant against a huge number of things that are wrong with software engineering in the kind of modern world. Because this is born out of our own experience working with organizations and seeing the frustration inside organizations and both on the engineering level, manager level, business level of things, not kind of working well. We've been in situations ourselves where we've experienced a lot of the pain that we're trying to deal with in the book. And that's why the book ends up being really quite practical and opinionated and quite directed or directive about how we think things should be arranged in order to make building and running software systems more effective in the current context. So that was our sort of aim in writing it, is to set forward that this kind of set of patterns that should, we think, help and help organizations to become more effective at software delivery.


Manuel Pais:

Yeah, it was also based on our experience. There are different inputs to the book. The main one was our experienced consulting and seeing problems that different organizations were having in terms of trying to adopt DevOps, continuous delivery, but then they didn't see the results. And then when you go a bit deeper, you see that they often have problems with misunderstandings between teams, lack of communication, the sort of people problems that we know are harder to solve. There's not just, having a new tool set, which okay, bring some benefits. But at the end of the day, if teams are not communicating or if they're over-communicating, so those kinds of problems. So the book helped get that message across on some of the patterns we saw working well and what didn't work so well. And fortunately, a lot of organizations have seen this as sort of new vocabulary to talk about teams, purpose, and how to evolve the interactions between them.


Matthew Skelton:

Now, interestingly, Manuel and I both have a substantial amount of experience working in, what's now known as CI/CD. So that... This kind of the center of the software delivery world, which is the diploma pipeline, obviously that's sort of popularized by the great book by Jez Humble and Dave Farley, back in 2010. And we both did a lot of work around continuous delivery and thinking about how organizations are set up. And actually, I mean, continuous delivery, the concepts in there are absolutely foundational and even since well, since the book was published more recently, I've realized how central it is to get those concepts embedded. That kind of... The idea of an end to end flow of value in the organization using a deployment pipeline is... Without that, then we can't have a fast flow change.


A lot of the stuff in Team Topologies sort of, it's not going to happen because we need the solid engineering practices underneath. And so our experience working at that kind of... If you like the heart of the organization, the deployment pipeline level, I think shaped our thinking significantly around what the foundations are for an organization that can have a fast flow of change in the software space.


Some other things, some of the influences on the book, I actually did a blog post back in 2013, looking at different kinds of organizations, different team responsibility boundaries. Back then we called it sort of DevOps. I mean, DevOps now means something different, but back then, DevOps was all kind of about thinking about the relationship between development teams and operations teams as they were called then.


And that blog post got lots of traction, lots of interest in the diagrams. They're like Venn diagrams with different colors representing different kinds of teams. And so Manuel and I then kind of turned that into a website called DevOps Topologies, which is still there all up and open source, well creative commons. And that got even further traction and helps organizations, including Netflix and Conde Nast international to think about the relationships between different teams and boundaries and different dynamics basically, different trade-offs for different responsibility boundaries. And that led us to more conversations and eventually we decided to go well beyond that original Venn diagram, static view of the world, and incorporate many other dimensions, which are actually important things like Conway's law. So there's sociotechnical mirroring between the communication paths in the organization and the likely software architecture that typically results or system architecture, the results, but also things like team cognitive load, which is a term that we've effectively sort of invented or shaped in the book. Applying content load at a team level is a very new concept.


And then making sure that the organizations are evolvable partly by introducing constraints on the types of teams and the types of interaction between teams and effectively that combination, those constraints end up actually being what we call enabling constraints in a complex adaptive system. We're reducing the degrees of freedom that this organization can use precisely so that we've got a better ability to listen to the signals that are happening inside the organization. So, that's where we ended up. It was a journey of probably, well, at least ideas, I guess, certainly from... Inspired by the continuous delivery book and a bunch of other things


Evan Bottcher:

I've got to say that in ThoughtWorks, in our consulting work, it's actually been quite impactful and my... And the creative constraint around the limited number of patterns and the pattern of language has been really, really powerful. The work that we've done, and I know I've done, many of us at the time when we've picked up the book have said, you have about three half finished slide decks that we've done with clients to describe many of these patterns that have been emergent in organizations that we work with. But you're essentially doing that work to publish and explain these with them in such an articulate way has been really, really helpful.

And some of them have been counter to a lot of the trends in the last few years of long lived product aligned teams, your stream-aligned teams, to introduce some alternative patterns that have been really great conversation starters, and actually sort of shattered some defaults that people have gone to and some dysfunction that's emerged as part of it. I wonder how much... When you were putting together the material, how much of it was through observation of existing patterns in teams and how much of it was your ideation of this is a recommended way forward?


Matthew Skelton:

It's a combination of observation, what works. A bit of ideation. But also starting with some principles, starting with fundamentally the principle of really three things. If you want to summarize Team Topologies, it's optimized for fast flow, rapid feedback and limits on team cognitive load. And if you start with those three principles, I mean, the Conway's law stuff is kind of useful, but is not really absolutely at the heart of it really in some respects. If you look for organization designs that optimize for fast flow of change toward live systems, rapid feedback from those live systems so that teams can course correct and limits on team cognitive load, then you will end up with a design, with a set of principles and practices that look like Team Topologies. So that's fundamentally what you would get.

You might have some slight differences. You might have five different types of team, or you might have four different interaction modes, fine, but if you start with those principles, then I would find it very difficult to see how you get something different. Because you want to avoid hand-offs for a fast flow, you don't want to have lots of waiting around for people to do different things. And you want an end-to-end responsibility, so you're very close to the customer, whoever the customer is, but increasingly you can't expect teams to just take on more and more and more and more stuff. So you need to have the cognitive load limited. Therefore, you need to have something like a platform. You might call it something different, but you're going to have to have some of that stack taken away from teams to help them focus on the stuff that's more germane and relevant. Effectively, we start with those principles and then looked and saw what different organizations were doing.

And effectively, looking back what we ended up doing sort of is reverse engineering what some organizations were doing. So Amazon, AWS is a classic example, whether or not it's true that the kind of that the memo from Jeff Bezos back in, whenever it was, 2002 saying, each team will communicate with other teams via clear API and treat other teams as if they were external. All services will be externalize-able. This kind of two pizza thing. A lot of people miss the point of the two pizza team, it's nothing about food or what have you. It's about a high trust and therefore a high degree of contextual awareness about changes, ability to make those changes very rapidly and course correct. And crucially where we're limiting the cognitive load on the teams by keeping them small. We cannot take on more and more and more stuff in the system. Therefore, we're going to have to compose our total solution out of a smaller number of discrete and well, nicely decoupled discrete services, which is kind of good software engineering practice. Right? Good engineering practice in general.

So we effectively did some reverse engineering on what was already out there and kind of worked out. Well, actually, what's really going on at Amazon. Yeah. Two pizza teams sound all nice and cutesy, but underneath that are hole of really key principles that actually are fundamental to building scalable software systems. And so we tried to unpack them from other places too, and looked at, for example, what Spotify were doing and, and what was the intent behind the Spotify model as people started to call it. What was actually going on there that's not just the names of the teams and the kind of roles and that kind of stuff? And so that's what we tried to capture in Team Topologies is this kind of reverse engineering of patterns that seem to work well, or seems to have had some value at a particular point in time, in certain contexts and try and work out what problems they were trying to solve or what they were trying to solve for, what the principles behind those, the external patterns and reverse engineer that, and then kind of characterize in a way which was forward looking.


Manuel Pais:

At the end of the day, Team Topologies is, as Matthew mentioned earlier, is really focused on if you want your organization to be able to respond faster, deliver faster, and also have healthy teams that are long lived and that are sustainable. And so sometimes people ask us what if an organization doesn't have those kinds of requirements? And that's okay, then maybe the Team Topologies is not the ideal approach for them. We just don't see many organizations these days that can afford not to go too fast and adapt quickly as things change so fast.

But even Team Topologies at the end of these is sort of a trade-off you have to make where if you want to introduce the ideas of Team Topologies, the constraints, the cavalry, and the understanding of why these constraints are useful, that's a bit of effort you have to invest so that you can then look at achieving fast flow and in particular stream-aligned teams. Yes, those kinds of long-lived teams cross functional with end-to-end responsibility are hard to achieve, and they need to be kind of grown over time. It's not like we're going to have... Create a new model today and tomorrow, we have streamlined teams that are high-performing. 


That's not how it's going to work. You need to invest in these teams. You need to look at what kind of support they're going to need in terms of platform enabling teams, other things in terms of engineering practices but also funding practices, et cetera, which is a little bit outside the scope of team topologies. But it has to be an investment in aligning to these ideas, these principles that Matthew was talking about. If you want to achieve faster flow, being able to adapt quickly and have healthy teams. And that might not be a trade-off that every organization wants to make, but we're not saying that everything about team topologies is going to be easy to adopt and make things much better, which is the fallacy, not because of who wrote the Spotify article or something like that. They were just explaining how this approach was helping Spotify at the time. But then when people copy that and try to just replicate in the organization without understanding why are we doing this and what's different and how do we adapt this to our needs, then it becomes a problem often with a lot of cargo culting and keywords that pop up and people want to follow without the benefits.


Matthew Skelton:

That was one reason why we wanted to avoid these specific roles, a bit like, I don't know, from scrum, you got scrum master and what-not. You might have these particular individual roles. We avoid that in Team Topologies precisely because we really wants to avoid yet another kind of, I don't know, industry certification drive for people to jump on this bandwagon and say, "Well, I'm a certified this, and therefore, we're now officially Agile version seven," or whatever it is. So we took some care to try to avoid that anti-pattern really and to try to emphasize the need for organizations to internalize the principles, fast flow, rapid feedback, limiting cognitive load on teams. 


Some other principles include high trust, looking at the size of groups inside the organization and try to maximize the trust in there, because if we've got high trust, we can have a faster flow of change because we don't get the distrust at least to people stopping things or wanting to approve or inspect things. But by coming back to the key principles all the time, then we hope that there's more chance that organizations can actually continuously adapt and evolve rather than just latching onto if you have a nice fixed set of teams that look like this, then all your problems will be solved. That's what we try to do, anyway.


Evan Bottcher:

I think that that transition, Manuel, you're talking to that organizations that try to adopt an approach. And I've been involved in organizations where they've tried to move towards product-aligned, long-lived team structures, but the underlying architecture does not match. And so the cognitive dissonance, I guess, that they bump into when they say, "No, but we want to have teams with independence and autonomy and ability to respond to a customer need," but actually the changes to the systems underneath, they don't. We coined this years ago in the Technology Radar, the Inverse Conway Maneuver. It's a very painful process for many organizations as they realign into a new structure and the architecture underneath has to tear itself apart in order to create that. And it takes time and causes a lot of pain, can be quite risky for some organizations.

But one of the things that appeals to me around the Team Topologies patterns is this concept of the complex subsystem, which does give us a language to describe some of the systems are not safe yet to take change as a self-service way or to be placed into the control of the streamlined teams. And so they may need some different treatment. A question that came up, I think I saw a webinar recently of yourself, talking to this, Matthew, was this complex subsystem team, the distinction between that and a platform or whether there's an evolution over time between those things. Do they naturally fall into some more self-service platform? It's interesting.


Matthew Skelton:

So part of the challenge here is that we're seeing a fundamental shift in organizational capabilities based on modern software, particularly cloud software, but cloud inspired software we can say, things like infrastructure as code, stuff like this. Going back 20 years, making changes to how your business runs or the kind of services you provide, it took some time. Now, potentially, particularly if what you provide has an online digital element to it, maybe all of it is that... But let's say insurance back in the day 20 years ago, someone would maybe visit a website, but probably just telephone or sends an email or something. Now it's all self-service via the customer sales. They can log in. They can change their insurance details. They can request a new quote. And a lot of the calculation of that new insurance premium is done automatically or semi-automatically. It's digitally enabled.


We're not saying every organization is a software company. I think that's misguided. But every organization now that wants to maintain a competitive or some differentiation can use software to do so. And so we're able to make changes that are much more rapid. What this is showing up is lots of organizations often don't even have a clear picture of what they provide all their purpose. And approaches like team topologies, approaches like domain-driven design, approaches like data mesh highlighting that lots of organizations have a huge lack of clarity in what they actually do and what their purposes. And so it's no wonder it's difficult to align to the business purpose because there isn't a business purpose or it's a very mushy or very ill-defined business purpose.


And in the past, that wasn't less of a problem because things moved so slowly. You have these systems that were, I don't know, owned or managed in a way, which needed big change to happen one after another and then multiple different people coordinating that change in order to get a big change out the door. Now, partly because of technology but partly because of other techniques, we're able to make changes much more rapidly. And therefore, the organizations that are succeeding are those that have aligned their technology architecture to the streams of business change or streams of organizational change that are needed. And yeah, the approaches like team topologies are highlighting this mismatch basically between the organizational business practices of the past and the ones that are possible with new techniques and new technologies.


Manuel Pais:

Yeah. And you mentioned the Inverse Conway Maneuver or Reverse Conway Maneuver, which was, I think, to a large extent promoted by people like James Lewis from ThoughtWorks. To actually do it right, I think there are many aspects to consider. One is I think what Matthew just said around if you don't understand what are your actual business streams of value, then you're not going to be able to match the architecture even to those streams of value because things that are going to be coupled and you don't have a good understanding of what are the architecture services, if you like, that you want to decouple. And then there are other layers even of coupling where you might even have an architecture that's looks more or less decoupled, but then teams still depend on each other because they share infrastructure or they share tooling or they are interlocked with some processes in the organization. And so we still don't get the autonomy and more independent teams working on different services because of that. So there are several layers, but starting, obviously, with what Matthew said, if you don't have clarity on what are your business streams, how can you even align either the system architecture or the teams to those streams?


And going back to your question about complicated subsystems and platforms, so actually we call them complicated subsystem because we want to avoid the complex terminology because of [inaudible 00:23:39] and the fact that's usually in situations where you have basically emergent things and properties that you cannot really control. But anyway, so for a complicated subsystem team, the starting point is that it would be too much cognitive load that a streamlined team would be responsible for this complicated part of the system. So that's a starting point, which means you might have one complicated subsystem which is only being used by one streamline team, and that still makes sense because of the reducing cognitive load on the streamline team. What we see in terms of evolution in general as the complicated subsystem, it might be in a constant flux of changes, but those systems at some point, the cadence of change starts to slow down. And as the technology evolves, in many cases you see better solutions, more third-party solutions.

For example, we can talk about I worked with systems that use face recognition subsystems, which was, in fact, complicated. You needed a PhD to work on that, but these days you have good solutions out there. So you could at some point expect to move the complicated of system with some changes into a platform service, where you rely more on the third party perhaps or it's just has a very slow cadence of change. It doesn't justify having a team around it. If we have the right documentation support, then it can become part of a platform. In fact, we expect a complicated subsystem team to adopt very similar behaviors to platform teams in the way they provide that service or subsystem to other teams.


Matthew Skelton:

Yeah, we didn't put it in the book, but we've since realized a complicated subsystem is like a mini platform. A lot of behaviors are very similar. We don't use exactly that terminology, but effectively that's how you should see it.


Manuel Pais:

It's one of the challenges we've seen now that Team Topologies has been out for about a year and a half and more organizations are adopting. We see some organizations that think they need more complicated subsystem teams than they actually do. And that's, I think, related to the fact that this seemed more similar to the traditional component teams, which when you look at fast flow are typically a bit of a problem, causing bottleneck when you have many teams that depend on the same component team. So that's one of the challenges, understanding we should limit as much as possible the number of complicated subsystems and try to move them into the platform when possible.


Evan Bottcher:

Yeah, there's no real limit to the people who can read some insight and then apply their own world view to it, and then say, "Okay, yes, we've got 1,700 complicated subsystems that are actually just component teams each working in a release train across the whole organization."


Manuel Pais:

Yeah. It goes back to what we were talking earlier around, trying to copy/paste these models without actually getting the underlying principles and ideas and why are we adopting this model.


Ashok Subramanian:

I think just too as a followup on the complicated subsystem, I think one of the challenges we see quite often is... And maybe this also mirrors to some of the points Matthew and Manuel were making about the purpose of an organization. A lot of time, I think we see when you go and try and identify what the "business," business obviously in double quotes, is trying to do or achieve is you sometimes end up getting pointed towards you need to go and ask the team that manages system X or whatever. 


And really, it almost has the flip of the technology driving or defining what needs to be done. And in those sort of situations where systems tend to have become quite large or they almost are like they tend to define and control almost what's happening within the organization. The transition of those complicated subsystems, especially in that transition into what might ultimately become multiple streamline teams, in that transition, what are the things you would suggest to people consider or look out for? Or what might that transition potentially look like?


Matthew Skelton:

That's a nice question. Before we wrote the book, we actually did some work for an organization, a global organization, but this particular part of it was based in the UK. And Manuel and I were working on getting continuous delivery in place, so the Plum pipelines and stuff like that. But there was a huge monolithic system in place already, and there was some really interesting discussions to watch at the time to look at the dynamic of different groups in the organization effectively positioning around this technology.


Manuel Pais:

And also culturally [crosstalk 00:29:46].


Matthew Skelton:

Oh, yeah. Culturally, for sure.


Manuel Pais:

In different countries, you need to be mindful of different. Even the national culture is different. And I think that had an impact in that organization as well.


Matthew Skelton:

You've got to be careful about some of the assumptions you might make. So in this particular case, we actually started off assuming that we would actually be able to get some aspects of this monolithic system under something that looked a bit like continuous delivery with some automated tests. It turns out that this particular system, the system itself had been sold from one multinational technology provider to another about four or five times, which indicates probably something about its suitability in general. But anyway, turns out it didn't have a proper way to test it, and all of the logic resided in database tables, which made it extremely hard to test and tease apart. And so then there's actually a really good indication that you actually need to have a proper low level understanding of the technology sometimes in order to be able to work out what's the scale and size of this problem. You can't just immediately assume, actually, that you can apply some of the patterns.

And that's a really good example of where the technology was driving the kinds of teams that were needed in place, precisely because you... It took 40 minutes to start a developer environment for the system, 40 minutes on a virtual machine, on a massive virtual machine in the cloud. I think they managed to get the build time for the system down from something like 48 hours to 21. They thought that was great just to build a new version of the software for the system. And at the time, the people there were like, "Oh, wow, this is amazing. We've more than halved the..." Well, yeah, but that's still way too long. I think eventually they got it down to something like four hours. But again, that's still [crosstalk 00:31:46] expect.


Manuel Pais:

I think to have a proper live environment that could scale and cope with demand, it took two weeks just to set up the infrastructure.


Matthew Skelton:

Yeah. Maybe these are edge cases, maybe not. But there's definitely some situations where you certainly don't want to over promise what you can achieve. And maybe some in some of these situations, the right thing to do is to hide some of that awkwardness behind some interface to allow you to innovate around the edge or on top and then focus on maybe getting some things in place where you're providing APIs into this older, more difficult system. This is all standard stuff with mainframes and things like that. But certainly, take the time to do the proper technical deep dive into the potential limitations of whichever system you're working with. 


In fact, this particular system was more limited than mainframes because at least on mainframes you you've got the ability to do virtualization and some modern testing tools. This particular thing was a real mess.


So yeah, don't over promise. Don't expect you can just rush in there and retrofit something that's designed for literally the two words that are on the cover of the book, fast flow. That's what it's designed for. We should not expect to be able to retrofit that onto a system that was designed for something else, designed for, immediate consistency, for example. Then that's a single relational database or a single logical relational databases. If they've optimized for immediate consistency, fine. It's going to feel very different from something where eventual consistency and separate streams of stuff is what we're aiming for.


Manuel Pais:

So in terms of that transition to streamline teams, you feel like, as we were saying before, the first step is starting understanding of what those streams actually are. And that's an evolution as well. In this example that Matthew was giving us this organization it's not about stopping everything and saying, "Okay. What we need is to," let's say, "adopt DevOps because it's going to solve all this problems. And now we're going to change or we're going to adopt Tribes or we're going to adopt whatever is the recent trend," and expect that to solve our problems when, in fact, we need to start with looking at what are the streams and start evolving from where you are towards a better place or a better fit for faster flow.


But it's not a revolution. It's not a big reorg that typically is going to achieve that. Sometimes it's going to be sort of painful process where we're starting to figure out, like in this example, there are technical issues that prevent us from achieving faster flow. And then there are team organization issues. In that example, there were some behavioral issues of people who are used to work in a certain way and don't see themselves as being bound to internal or external customers. So they just want to do their work as they've always known. So anyway, there are a number of factors, behavioral, technical and organizational, that means it's going to be a journey to get to better place. And so when we were helping clients and their teams, what we're trying to do is not let's create a new model with only streamline teams or mostly streamline teams and the platform, et cetera. What we're doing is understand where are you now, which teams do you have and then start to help those teams figure out, are we more of a streamlined team? And if yes, what are we lacking or are we missing to be more in line with the ideas of a streamlined team if that's what we want to be?


And so often then we get into the responsibilities and the capabilities that the team has and people like John Cutler, for example, who does a lot of great work and posts around product development but also the team dynamics aspects. So he has some really good stuff around the domains of responsibility for streamline teams, where you might have teams that basically just do build and test. Then you have teams that actually are able to support their own service in production. But then you still have a lot of teams that have very little input into the actual product development or service development, understanding the customers, what do the customers need and then being able to experiment. You start seeing this range of responsibilities that you would expect from a real autonomous streamline team or self-sufficient streamline team. And so it's not going to be a one-step changes. But we can look at teams today, and with this patterns in place and this typologies, we can start looking at what's missing and taking steps towards that.


Evan Bottcher:

I've noticed in the last few years, there's been such a big emphasis, and I know myself it's been a big part of my consulting, has been around organizations that want to, to introduce some internal platform to be able to go faster, reduce that burden. A beautiful description in the book around cognitive load is a way of describing what you're trying to reduce to enable teams to move faster. I wonder if we could explore that a little bit and particularly around the concept of the importance of product thinking when it comes to building that platform.


Matthew Skelton:

For us, it was a real light bulb moment when we started thinking about how this idea of cognitive load would apply to whole teams. And it resonated with me things that we'd effectively been seeding for many years anyway. But applied to a team level, it suddenly starts to make a little more sense. So we talked about the concept of team cognitive load in the book because ultimately what we want teams to focus on is the thing that is important for the organization. If you're an organization that's working in banking, then a lot of teams will be focused on banking because that's the thing which is a differentiator. Your team is primarily focused on the differentiating aspects of the work because particularly the pace of technology change now, there's so much stuff which is coming along so quickly from external providers, cloud providers, whatever that if we're working on things which are non-differentiating, there's a real risk that we're just going to get left behind.


And so we've got this idea strongly that we need to be focused on the things that are differentiating for the organization at this point and try to minimize the cognitive load on teams for things that are extraneous to that. And this idea of minimizing extraneous team cognitive load for streamlined teams is what drives the other three team types, really. Why we've got these other three team types, enabling team, complicated subsystem team and platform team, is primarily to reduce cognitive load on streamline teams. And that's it. In the past, an internal platform might've been there to share hardware or to share license costs or something like this. And that's a reasonable design decision back in the day when hardware was extremely expensive and scarce and when you only had one provider to choose from if you wanted to relational database and so on. So it's not that that stuff is inherently bad. It's just that the dynamics of business landscape have changed. Like we said before, we're trying to optimize for a fast flow change.


If you've got a streamlined team that wants to go very, very quickly in a particular area, they have the choice to work on the business domain relevant things. Let's say it's financial transactions. If they want to go really quickly, then you could allow them to build their own infrastructure. If you want to allow them to go very, very, very, very quickly, maybe they need to create their own special database. Maybe there's nothing currently in the market that suits them. And the right thing to do at this point in time is to create their own database. Fine. If that helps him to go quickly, safely and deliver customer value, then let them do it if the business case is there and if the organization is willing to pay for it. And if that gives them the autonomy and the speed and the safety, then why wouldn't you do it if the organization has got the money?


At some point, that team is going to be dealing with financial transactions for consumers and building a lot of infrastructure and maintaining this custom database. At some point, that cognitive load is going to get in the way, way of delivery. So at that point, you've got a choice. What do you do? Do you change the skills mix? Do you send people on training? Do you switch the technologies to make them easier, move up the stack, if you like, to a simpler language or whatever? Or do you do something like move some of that stuff into a platform? But all of the thinking around platforms really should be driven by this combination of fast flow, rapid feedback and limiting cognitive load. From our point of view, that should be the driver. That should be the thing that tells you what the right boundaries are or that informs what those right boundaries should be, not just, oh, we definitely need a platform because this stuff is infrastructure. I don't think that's true. It's entirely reasonable for a streamlined team that's focused on delivering pizzas or whatever to manage his own infrastructure if that's the right thing to do for speed and safety. 


But you've got to balance that with the kind of available cognitive load, or the available content capacity based on, do they have enough time to think about the main business domain? And if managing that infrastructure means they don't have enough time to focus on selling pizzas, then you've got a problem, and you want to take some of that responsibility away. So it's this trade off, right? So, it's an engineering trade-off and from our point of view, that's what really should drive the thinking, particularly around platforms.


So conversely, the starting point for platform shouldn't be, whatever the platform does, however it does it, it should not increase content load on the teams that are using it. That's, for us a really important starting point. That should be kind of the North Star for the platform; are we increasing the cognitive load on teams that are using our platform? And if we are, we're doing it wrong. And then we can start looking at applying things like product management techniques and self-service, and all these other things as well, which is super important, but if we apply all of these great techniques, and we're increasing cognitive load on teams using the platform, we're doing it wrong.


Manuel Pais:

Yeah. Very simple, I guess, straightforward example that maybe many listeners will relate to, or not. But, if your idea of platform is for example, we're providing coordinators to our internal teams, and what we do is create some clusters and then tell teams to use them and deploy their services, then that is a very likely is increasing, and probably by a lot, because now you're asking the development teams or streamlined teams, if you like, to just have to learn a whole load of new information about this new platform. And, yes, there's great Kubernetes documentation, but if you're asking teams to read that, to be able to do the things that they were doing before in a totally new platform, that's a massive increase in cognitive load.

So I would say platform teams kind of should have almost, I say this in the training that we do for platform engineering is, almost have a kind of startup mentality where yes, first of all, you're providing some technology and some new ways of doing things, or better ways, which is kind of starting point, but then how do we make this easy to consume, easy to understand, and also how do we actually provide even more value internally to our internal teams, by understanding what are their specific needs? And how can we, not just provide the technology access, but actually provide the good abstraction layers, and the differentiators that Matthew was talking about, for our platform as a product?

And in your definition, Evan, that we reference all the time about, "What constitutes a modern digital platform?" You talk about it, it should be a compelling internal product, right? So I think that definition is, is really good because compelling means, okay, we see did the customers of platform see the value they see, "Okay, this makes my life easier because now I don't have to understand all these details about Kubernetes deployments or what have you, you have provided me a better abstraction, a higher level obstruction so that I can just focus on the things I need to do for my product or for my business stream and focus on that. And I know that with a simple configuration file or something, I can get the service out into the live environments."

And obviously that applies to all sorts of other kind of platform services you might have. So think of things like, what is the value proposition of the platform and its different services, and what is the actual value we're providing internally? And how does this differentiate from the development teams directly using some third-party external option? So we have that internal context of what our teams need, or at least we can talk to them directly and understand what they need, that should give us a very strong, competitive advantage, right? Compared to external organizations. But we don't always see that being leveraged adequately. So many platform teams think, "Well, we're just making technology available internally." And that's not really devalue, and it's not necessarily reducing cognitive load.

And finally, if you have that startup approach or that product-driven approach, then you should be very aware of the adoption cycle, right? So the adoption cycle for products also should apply internally to platform services, where you have early adopters; people who on teams who are more kind of engaged and willing to take on, rough edges of the initial versions of the service, and they're willing to provide feedback, and you'll have early majority and late majority. And so, you have to understand also, or have at least an idea of where is our service position in the adoption life cycle internally, so that we might need to do different things. If we're now targeting the late majority, then we need to understand why are they not adopting these surveys? What are the kind of frictions that they might have, or are we missing out on some workflows that are important for certain teams that are not supported by the service and so on? So it's, I think it's crucial to, to have that mindset of the platform as a product.


Matthew Skelton:

One phrase that we came up with after publishing the book, sadly, we wish in retrospect, it would've been great to have this phrase in the book is, the platform is a curated experience for engineers. And that's the starting point. The starting point is not technology, the starting point is, what's the experience for people using it? Does it increase their cognitive load? Does it make their lives easier? Does it help them to focus on their main area? And curating the experience then starts to get us thinking about product, and design thinking, and user experience, and all of these kind of areas that are relatively well understood now in 2021, when I'm recording this. Things that are well understood, particularly in terms of kind of consumer software services. So we're taking that rich body of awareness and practice and applying it to an internal platform.

So, it's not a straightforward translation, but the key thing there is that there's a whole number of people in the industry who have got these skills already. What we're doing is applying it to internal customers; the streamline teams, software development teams, engineers, testers, and so on, and applying to kind of this internal platform product. So we do not have to invent a whole bunch of new techniques and things, we can sort of reuse and adapt to the existing team, techniques that have already been proven. I mean, proven at scale by cloud companies that are valued at billions and billions of dollars, they've used these techniques, and proven them to be at least successful in the wider market. And so, starting with that sense of it, of needing to curate an experience for our customers, the internal teams, is, I think a really, again, this kind of North Star, a good way to guide the decisions we take about a modern platform.


Manuel Pais:

Well, we see still many clients where they're sometimes don't have even product management mindset, or even the role inside the platform teams. And so, with some clients who have actually recommended not just to start having product management roles in the platform, but actually your most experienced product managers should be in those teams, because that's where there's the least awareness. So maybe you're in your development teams or streamlined teams, they are more kind of common to find it this more focused on product development and user experience, so on. But actually it's in the platform where you probably needed the most experienced people to be helping these teams kind of get sort of different ways of working, and looking at who are the customers of this platform.


Evan Bottcher:

Well, I'm sure that we could explore it. There's a whole book and a bunch of additional material, and lessons to be leaned here. I think we should start to wrap this discussion up. Where would you suggest that listeners go to next in order to find out more? It's obviously the book, but additional materials that maybe you're putting together?


Manuel Pais:

Yes. So on our website, teamtopologies.com, which is where we had new industry examples. We recently published some infographics that have been quite well received, on kind of getting started with Team Topologies, and Team Topologies in a nutshell, which helped people at least start talking about this in the organization, understand the why of Team Topologies. And we're also have launched a Team Topologies Academy, which the first course that we have is for sort of distilled version of Team Topologies book for people who want to, with a couple of hours, they can start getting the vocabulary, and the basic understanding of the principles and the patterns in Team Topologies. And that is quite useful when organizations want to adopt some of these ideas, and they want to have kind of a shared understanding across all of their staff. So yeah, all those resources, we also have a number of repositories on GitHub. So github.com/teamtopologies, or you can go to teamtopologies.com/tools, and basically a set of templates, and assessments, and useful techniques that we use ourselves with clients when we're helping them kind of understand that, and start evolving towards a faster flow.


Ashok Subramanian:

That's great. I've looked at the website, there's definitely a bunch of great resources that are including, I think the templates that you've referred to Manuel, we've used. I've used some of that myself. So you're definitely worth a look. I know, we could go on for, I'm sure another hour in a bit quite easily with the questions, but I would say, to me, I think some of the main sort of takeaways, if it was from this was, while the patterns that you've identified, I think what I found even more sort of interesting or illuminating was the principles behind it. And the three things that you sort of called out were around making sure anything that people are doing is focused towards fast flow, rapid feedback and limiting the cognitive load, right? I think that they're definitely things I think they should for our listeners, as well, if there's something that you want to sort of take away from this, make sure that you're focusing on those in terms of any evolution towards team structures and so on.

Thank you so much to, to Evan, Manuel and Matt, I think it's been a fascinating discussion, and hope our listeners have enjoyed listening to this episode of the ThoughtWorks Technology podcast.

Check out the latest edition of the Technology Radar

More Episodes
Episode Name
Published