Master

Making developer experience a reality

10 June, 2021 | 41 min 7 sec
Podcast Host Rebecca Parsons | Podcast Guest Tim Cochran with Keyur Govande and Pia Nilsson
Listen on these platforms

Brief Summary

There’s often lots of talk about how companies can make their developers more productive. But it may be more useful to think about developer effectiveness: how to ensure they’re building the most useful products for their customers. This is about providing an environment that allows developers to be effective. Our special guests from Spotify and Etsy give us their unique perspectives.

Podcast transcript


Rebeca Parsons:

Hello everybody, my name is Rebecca Parsons, I'm one of the co-hosts of the ThoughtWorks technology podcast. And I would like to welcome you today to a discussion on developer effectiveness. I am joined by my colleague, Tim Cochran, and we have two special guests as well, Keyur Govande and Pia Nilsson. So first, Keyur, if you'd give a brief introduction to yourself.


Keyur Govande:

Sure. Hi everyone. My name is Keyur. I am currently the Chief Architect and VP of Engineering at Etsy, and I've been there for the last decade. Before Etsy, I worked at PayPal for four years as a software engineer.


Rebeca Parsons:

And Pia.


Pia Nilsson:

Hi everyone, I'm Pia, and I work at Spotify. I lead the Platform Developer Experience Department there, which I have been working in for five years approximately. Before that I was an engineer for over a decade. And I have been a huge fan of the ThoughtWorks books and the articles.


Rebeca Parsons:

And Tim, why don't you tell us a little about yourself and then tell us a little bit about what this developer effectiveness stuff is all about?


Tim Cochran:

Sure. So my name is Tim Cochran. I'm Technical Director for the east market at ThoughtWorks. So, I'm looking after the tech strategy and our engineers on the east coast. Been at ThoughtWorks for 16 years, similar to Pia and Keyur, a developer, and now not. And at ThoughtWorks, we work with all different kinds of companies, different stages, different kinds of projects. And we'll talk more about that.


Tim Cochran:

So on the topic of developer effectiveness, so this came out of a number of different things. Some of it was, as I mentioned, we're able to work at different kinds of companies. So some of them are digital natives, like Etsy and Spotify and others are enterprises, different sizes, completely different technology stacks, and then completely different stages of their journey towards dev ops Nirvana and embracing digital practices.


Tim Cochran:

So working with these different companies, we were able to see what some of the different engineering practices, and some of different output from what we're seeing. And ultimate output in terms of product. And then also in terms of the technology platforms and the environments that they've built. And what I did, and some of my colleagues, with some of the research is try to identify what some of those things are from an effectiveness point of view.


Tim Cochran:

Now, the other competing things that happens is we get a lot of people asking us, how do I make my developers more productive? It's this question. And unfortunately it quickly turns into things like, I'm sure everybody's been through this, measuring lines of code or commits or things like that, very tactical measurements that, in my opinion, are not particularly useful. And actually often are very easy to game, and will drive bad behavior.


Tim Cochran:

But it's still a valid question. How do I make my team or my developers more productive? So to separate it, we use this term developer effectiveness. So it's more about how do I make my developers effective? How do I make sure that they're putting their energies and time to building the most useful products for their customers essentially. So a lot of it's about working, not necessarily working longer or harder or hiring better smarter people, it's just working smarter, using your time better and making sure that the company is providing that environment that allows developers to be effective.


Tim Cochran:

So, that's the broad concept. And I can go into a bit more detail about some of the principles behind it, if you'd like. So, what we did essentially is take a look at what do developers do all day, what is most of their time spent doing? And we did a fair amount of research on that.


Tim Cochran:

It's no surprising to most engineers listening that there's a lot of misused time. There's a lot of time spent in meetings, in chasing down problems. Like debugging a production issue, but you don't have the right information, so you're doing trial and error. Let me try this, see if it works. There's lots of trying to find information, so for example, I need API documentation or technical documentation. And then sometimes it's a lot of time spent building software or doing stuff, but those things aren't necessarily correlated or benefiting the customers in the right way. So, there's a lot of that.


Tim Cochran:

And then what we also saw is a lot of small amounts of friction, so this is when we start comparing the, what we describe as the high effective versus low effective companies, is it isn't just one thing, it's these small amounts of friction that the high effective companies, they've done a lot of work to make everything fast and friction-free. And it comes down to these small amounts of problems.


Tim Cochran:

And what we tend to find is with companies, when they're trying to get better, they focus on something big, so it's normally like, all right, if we move our services to the cloud, or we move to a microservice architecture, or we bring in this platform or whatever, it'll take us two years. But at that point we'll be perfect, everything will be great.


Tim Cochran:

Of course, that's not the reality. And actually what happens quite often recently over the last few years is, somebody's jumped on one of these bandwagons and they've come to us after trying to do it for a couple of years, and they've actually got slower. I'm not going to beat up on microservices on this call, but sometimes that's a common situation that you've brought in something complex that does have benefits, but also has made situations more difficult. In the example of microservices, you now have a much more distributed system. And then what's happened is they've not created the environment, the developers are no longer effective.


Tim Cochran:

And sometimes it took a long time for them to realize that. Because it's a new technology, and are spending time, it's just this learning curve. Then at some point, someone points out, hey, we've been doing this for two years, what's going on here? Aren't we meant to be going super fast if we're doing this new technology?


Tim Cochran:

So, what it is, what we tried to do is look at, okay, these are the things that you ... what are the things that you need to focus on? And we're not saying you shouldn't adopt new technologies and new platforms, but there's certain things that we have to make sure that we get right. And this is where we start to talk about these feedback loops. Where if you look at software development, a way of looking at it, is a series of feedback loops.


Tim Cochran:

As an engineer or as a product delivery team, you do something and you want to get some validation. That validation might be from a peer, a person. It might be from an automation. It might be from yourself, you're checking yourself. But I want to do that series of feedback loops. And these are big and small.


Tim Cochran:

So, I can give some quick examples here. So, a typical one obviously is the developer does throughout the day which is, I write some code and in my local development environment, I want to know that it works. And it does what I expect it to do. So this is the most simplest one. And I do that by writing the code, having it loaded into my development app server, look at it. Okay, did it compile, did it work? Does it look like it worked? Does my feature work? I'm also running tests. I have the tests that I wrote, my regression tests, does my code work?


Tim Cochran:

That is a feedback loop that, depending on the organization, the engineer does 10 times, a hundred times a day. And of course, everybody knows this, but what we found is interesting is sometimes organizations are not paying enough attention to these small micro feedback loops. They focus on things that are easy to grasp. So for example, I have manual QA, very easy to grasp that those should be automated. Or I'm using a data center instead of the cloud, easy to cross.


Tim Cochran:

But some of this hidden friction, these small amounts of friction, don't really get optimized as much. And some of that's because it's hard to measure. Or something that's hard to grasp, something that maybe takes two minutes, but actually it should take 30 seconds or 15 seconds, what that does, why that slows people down. And that's where we start to talk about the compounding effects.


Tim Cochran:

The idea is that, okay, these frictions, these time periods add up, but it's also the context lost. It's the cognitive overhead. It's the, “okay, if I have two minutes, I'm going to go and play ping pong or go and take a walk, or check my email," those kinds of things. And this is what we've noticed when I worked at some of the digital native companies and the ones that are highly effective is that they've spent a lot of time getting this development environment to be very, very fast. So that's the small ones and you can take that further. Other example ... And I don't want to go on about this too much. You can look into the article for more information. But there's other examples that are like, I need to debug a problem. And then you have more larger feedback loops. So for example, I'm going to create a new service and I want to put that into production and it should have all the bells and whistles. It should have observability, deployment pipeline. Or I'm standing up a new team. I'm changing teams. I want to become productive. Right?


Tim Cochran:

And then another key one that I missed was discovering information. I want to know about information that the API depends on, and there's lots of different things there about like, "Okay, am I finding technical documentation? Do I have access to that team? What are the steps? How long does it take to go through that?" Yeah. So that's the basic, the concept of those key feedback loops that we're recommending to companies if they want to be effective to optimize.


Rebeca Parsons:

So Keyur, why don't you tell me a little bit about your environment at Etsy and how these issues of developer effectiveness play out there?


Keyur Govande:

Sure. Like I said before, I've been at Etsy for a little over a decade. And in that time, Etsy has grown by an order of magnitude. So things that used to work well, when I joined in 2011, we need to tweak and tune them for the scale that we operate at today. And some of you may know, Etsy was one of the leading companies to adopt continuous integration, continuous deployment, DevOps, and blameless culture. And one thing that I wanted to call out from what Tim was saying was engineers come to work to do a good job. That is one of the key tenets of blameless culture. You're here, you want to be effective. So as an infrastructure team, one of our missions is to get out of people's ways so they can do their job as best as they possibly can.


Keyur Govande:

So for the last, well, a little over a year, we've been investing in developer experience as one of the biggest initiatives that the infrastructure teams work on. And there are a few different pillars to it. So a little bit of context about Etsy is we are a PHP, Apache, MySQL, monolith, and we've been for the last decade. And we have a lot of engineers working in the shared space. So one of our goals was how can we continue to be effective? Because we actually our monolith. To Tim's point, microservices are great, but we have over a decade of experience operating this monolith. We understand where all the sharp edges are. And we think we have a little bit more time before we're forced to break it apart. And we want to maximize that time period.


Keyur Govande:

So an area that we've focused on is deployment velocity. As the number of engineers grows, we found that we needed to invest in tooling and some systemic changes to how we do deployments in order to scale and keep deployments happening as quickly as they used to 10 years ago.


Keyur Govande:

A little bit more background about how we do deployments is at Etsy, we relay on the engineer who wrote the code to be the one to push the button to go to production, or part of a group of engineers. But the human presses the button that ships the code onto production and makes it live. And we think that is really important because it creates a clear sense of ownership and accountability for maintaining the health of production. So the way that we do this is we have what we call a push queue. You join in and a few more engineers might be with you on the train, and then all of you collaborate to get your code to product.


Keyur Govande:

But what this means in practice, is as the number of engineers grows, the queue cannot scale and people might be waiting longer and longer times to get their code out, which is a suboptimal developer experience. So we have put in a bunch of time to figure out what are the steps that we can remove from the process? How can we make it ... We want to keep the accountability aspect, but we also want to take away the tedium or the toil that might exist in the process. So we've spent a bunch of time optimizing the actual systems that do the deployment, trying out newer, faster methods of getting code out into production, and also providing a tighter feedback loop so that when code hits prod you know that it's good and you can move on really quickly.


Keyur Govande:

Another area that we've invested in a lot is actually around our machine learning and data science. This one is particularly interesting because I feel like they're in their early stages of maturity, the problems in this space are similar to the problems that existed around developing web applications a decade ago. You want to do continuous integration, continuous deployment. You want to be able to test fast and shorten the feedback loop. But the domain is such that you're operating on loss amounts of data. The amount of computation required to gain confidence is a lot. The feedback loop might be hours long. So we're trying to figure out what can we do in order to bring those best practices that we already use in one side of Etsy into this other area, where now there is an explosion of engineers working in this space. So those are two examples where we have focused on in the last year or so.


Rebeca Parsons:

Great. Thank you. And Pia, why don't you tell us a little bit about how things work in Spotify?


Pia Nilsson:

Absolutely. So when I joined Spotify in 2016, we hadn't built our product, Backstage. And the challenges that we were facing at the time was that we were in hyper-growth. And so we were hiring super successfully, but our metrics were showing that we weren't increasing our productivity. So we were of course, using the accelerate metrics and we were most relying on the deployment frequency metric. However, it was a higher function metric that was helping us notice this the most, which was our onboarding metric.


Pia Nilsson:

So back in 2016, we noticed that our onboarding time was increasing rapidly over 60 days for one engineer to be called onboarded. Which is the number of days it takes for an engineer to do their 10th pull request. So we were hiring a really successful but becoming slower, you could say. So what did we do? We started asking our engineers. We ran user research to figure out what is blocking everyone's productivity. And they came back very strong with two main problems.


Pia Nilsson:

First, it was the context switching. So the context switching was because we had a very fragmented ecosystem. So why did we have a fragmented ecosystem? It has to do with our autonomous culture. So at Spotify, every single team is like a little startup and it's free to charge ahead and reach their mission by themselves. So infrastructure has been built in isolated islands. This is very conducive for speed, but when we grow, that's where stuff starts to break down. So of course, this leads to a lot of cognitive load for our engineers. Hence, the context switching as number one blocker.


Pia Nilsson:

The number two blocker that the engineers brought to us was it's just hard to find things. And you can, of course, see this. It's a fragmented ecosystem. So which service should I be integrating with as an engineer? Should I use the user data service that the customer service team has built? Or should I use the slightly different user data service that the premium team has built? Or should I just go ahead and build my own? This of course, leads to further fragmentation and we're back to problem number one. So those were the two things that we discovered in our user research.


Pia Nilsson:

What we did about this is that we realized, of course, we needed to have one place for everyone to go and find things so that we stopped this fragmentation spiral. That's why we started building Backstage. Building Backstage for us was an interesting journey. As we have this autonomous team culture, we couldn't just build a central place and say, "Please use this now, everyone." So we had to build it with the entire R&D. And we did this through code owners, the function code owners in GitHub, which was really helpful for us. So that means a team out there in premium builds their products. And then they keep owning that product. But it's attached to the Backstage Git Repo. So it pops up in Backstage, but the ownership is completely distributed out to all the teams. And that's how we grew Backstage organically throughout these five years.


Rebeca Parsons:

So why don't you tell us a little bit more about what Backstage actually does?


Pia Nilsson:

Oh, great. So Backstage tries to solve three use cases. It helps engineering teams to create stuff, to manage stuff, and to find stuff. So creating, you can imagine, data pipelines, back services, and, of course, web UIs, ML models, all the things that you love. That's, of course, where we try to move in with our standards and best practices. So it's a really nice feature for a platform organization to have this one way. We call it a golden path at Spotify.


Pia Nilsson:

Then we have the managed use case. So engineers can go to Backstage and see all their services and the status of them, of course. We connect them to various other systems, sort of CSED, of course, and monitoring, logging, tracing, all of that nice stuff. Then you have discoverability. So discoverability is interesting, as it both goes to you can find the teams and their ownerships or you can find systems and who owns them. Another feature in our discoverability explore use case is to find platforms. So Backstage is also trying to solve this problem at the higher level, because sometimes you need to find a platform and not just a system. The bigger your company gets, the more platforms you are starting to integrate towards instead of products or systems.


Rebeca Parsons:

Well, and it seems like what you're talking about here is this word that so many people get nervous about, which is governance. How do you guide people to do the right thing? I like that. What did you call it, the golden pathway?


Pia Nilsson:

Yeah.


Rebeca Parsons:

We have other clients who refer to it as, "Okay, well, this is the paved road. If you follow what we've asked you to do, everything will be taken care of. If you want to go off-roading, you can do that. But here are the things that you have to take care of yourself in order to fit into the broader structure." How do you get inputs from the people who are using your platform about how useful it is? I mean, ultimately, these kinds of things that you both are talking about, the customer is actually the developer. So what are your mechanisms? [inaudible 00:24:47], maybe you can address that first. How do you go about this governance process in determining how people are supposed to really be working?


Keyur Govande:

That's a great question. So partly at Etsy, the problem is a little bit differently shaped than it would be elsewhere, because we are a monolith. So the paved road is the monolith's way of doing things. So it's a monolithic BHB code base, a JavaScripts thing in the same monorepo, and then a bunch of shards. So in terms of building products, the paved road already exists, because what you get by working in the monolith is everything that Pia just mentioned. You get CI/CD. You get all of the logging and monitoring that you would expect, tracing as well. You get an SRE team attached to maintaining the website. So when something happens, you know who to page and you know how to get help when things are not going so great.


Keyur Govande:

The interesting piece for Etsy at this particular moment in time is we have moved to the cloud, and we actually are now figuring out how can we start to develop outside the monolith? That's where I think the governance issues are more interesting, because we're building some paved roads for people to be able to deploy services, and we're still trying to figure out what is the best way to wrangle the complexity here? What we do at Etsy today is we do, I guess, two things. One, for people deploying new services, we have an architecture advisory group. It's a group of facilitators to whom you can bring technical questions, and they will connect the dots for you across the company. There are representatives on it from all the departments at Etsy, engineering departments.


Keyur Govande:

The other thing we're doing is our SRE team has put out a production readiness checklist, and they partner with you when you're trying to launch a new service. They will partner with you to make sure you have crossed all the T's and dotted all the I's and sort of give you feedback about what you're doing and what might be a better way of doing it. So those are our two governance practices today.


Rebeca Parsons:

Pia, how about Spotify? How do you incorporate the feedback of the developers into what you're doing?


Pia Nilsson:

Yeah. So at Spotify, we don't have a strong governance culture, so we have to tread very carefully to be aligned with our own culture. How we have done that is through building a recommendation engine. So every time a build runs at Spotify, the team sees in their build feedback what kind of best practices that the platform mission recommends that they would consider. That could be both regarding testing architecture. It could be security tiering, like, "You're security tier number one. So think about these things," or it could also be migrations, like, "You are on the node library here on this stack, and we suggest that you move to this version." Then under each recommendation, we call them checks. We try to explain to the engineers why it would benefit to actually do something about this. Also, we try to sometimes give estimate, if we can, of how much time it would probably take to fix a migration or something like that.


Pia Nilsson:

We also do one other thing, and we call it the internal fleet management, which is that we try to basically take this pain from our engineers. So how do we do that? I mentioned earlier we have these golden paths. So we measure all systems at Spotify how close they are to the golden paths. If they are on the golden path, we call them they are in the golden state. So if you are not on the golden state, which is like quite a lot of systems, we can try to migrate them automatically. That's what we call fleet management. How we do that is we push out a lot of automatic pull requests, so the teams have the freedom to merge them, of course. Sometimes we also automerge when we are super confident that it wouldn't disturb production, of course. So those are the two, sort of. We recommend and nudge, and we do fleet management.


Rebeca Parsons:

So Tim, both of these examples are digital natives, but we spend a lot of time dealing with enterprises, that nice generic word. How do these things play out when you move out of the digital native arena into enterprises with potentially much larger development organizations, more legacy, different kinds of processes?


Tim Cochran:

Yeah. So this is something that, obviously, at ThoughtWorks, we're working with a lot of enterprises that, as you said, have a lot of legacy. What we're seeing, actually, is a lot of the concepts that Pia and Keyur have been talking about have started to permeate the industry, right? People are recognizing that new ways of governance is a good idea, right? They're recognizing that developer happiness is important, because it's a challenging market at the moment. Also, motivated developers are more productive, those kind of things. So everyone is recognizing that DevOps Coach is obviously ... That's been around for a while.


Tim Cochran:

The other thing that we're seeing that companies are really moving towards is these ideas of platform teams, right? It's accepting that there are going to be some teams that are developing code for engineers, right? In the past, I think everybody's probably experienced an engineering team just being let loose and gone and build some crazy platform or something, and it's going to solve all the problems. What we're actually seeing now is, yeah, we do want to build these sort of platforms and things that are going to give us economies of scale, but we really need to be careful about what we're building. What we do is we apply the product management techniques that you do for your product to your platforms and your technical capabilities, right?


Tim Cochran:

So you prioritize in the same way. You talk to your users in the same way. You A/B test. You get feedback, all the same things. It's not an opportunity just to sort of flex your technical muscle. It's we're actually trying to build useful things and also in an incremental way. As product development is working, we want to build it in an incremental way. We want to provide new improvements.


Tim Cochran:

But back to your question. So these are concepts that we're seeing a lot of companies are asking us for. Obviously, the big difference is there's a big change from where they are to where they want to go. Quite often, what we see is maybe the business and technology is not quite together as you would at a digital native. I mean, what we recommend to start with, is to take an analytical approach and to expose the problems, right, to create transparencies. We find there's a lot of hidden miss work or rework and that kind of stuff. So if you can use, the Accelerate book is now well-known. You can use the pull key metrics. And if you can look like Pia stated a really good metric about the 10th commit. But if there are things like that, you want to be able to quantify that, right?


Tim Cochran:

A nice thing now is that people have done some work for you. If you read the DevOps Handbook or Accelerate, they've made a correlation between high-performing companies and DevOps practices, right? So you don't necessarily have to justify those. I don't have to justify that these are the four key metrics. That's done for you, if you trust that research, right? So it's more that if we follow these metrics and we move towards this thing, we'll get there. The other thing is really about the organizational changes, in terms of empowering the developers. So really thinking about talking to developers, giving them more freedom to apply themselves to a problem, as opposed to just being hand requirements and those kind of things. I think a lot of companies are on that journey.


Tim Cochran:

Of course, a lot of stuff that we're talking about and this feedback, you do need a fair amount of rebuilding, perhaps. Some things will have to be rebuilt, but also that there are ways of working with legacy technology. You can put API abstractions and APIs around it, and there's ways of containerizing so that there are ways of bringing up some of that legacy technology so they can work in a DevOps way. You don't have to completely throw away everything and rebuild it, so that's the journey that we're on. It's changing the culture and the processes. And then also addressing the fact that we have all these old systems that does need to be brought up to a certain standard. A very key one is, we come across this all the time, we started building APIs, but they're not stable and if you're really going to have ownership in your providing APIs to the rest of your company, you have to be able to rely on it. And that's something that we see a lot of companies struggling with as they move towards adopting APIs and DevOps culture and things like that.


Rebeca Parsons:

Thank you. And so, Pia, do you have other suggestions for how you might justify investment in these kinds of tools? Clearly Spotify has invested a great deal already. Do you find you have to argue for the value of this and if so, how do you do it?


Pia Nilsson:

I think at Spotify, it's pretty clear that this has brought a huge value to the company, as we are using clear metrics and we can see the difference. So at Spotify, we were able to move the onboarding metric that I mentioned earlier from over 60 days to a stable 20. So it takes 20 days for an engineer to do their 10s pull request, which is our target. And if you have numbers like that in your organization, that people can understand, that has a business relationship, onboarding, for examples, everyone gets that and it's important.


Pia Nilsson:

I find there is easy to get buy-in for investments in developer experience. So we rely heavily on metrics and we run an engineering satisfaction survey every quarter with a third of our engineering population. And that gives us a lot of data like what are the key blockers? That's one question. So we can really track how are we doing on the context switching, for example. How are we doing on the inter-team dependencies blockers? So I think metrics is the way to go. And there are higher level metrics that one can use and combine with the Xcelerate metrics.


Rebeca Parsons:

And Keyur, how about at Etsy?


Keyur Govande:

I'll say that, I guess my answer will be very similar to Pia's. So we, for the last decade or so, we've been tracking our lead time. Once you have a commit that's good to go, how quickly does it get to production? And the thing that I think we're all proud of is that through all the growth that we have seen, the metric has remained flat. So as the complexity of the code base has grown and as the number of people working in it has grown, we've still been able to deploy at the same velocity that we used to when we were much, much smaller. So keeping an eye on some systemic things like this has helped us make the case for when things slow down, we need to invest in this because shipping code is how a technology company grows and makes money.


Keyur Govande:

So the other thing I was going to say, that we have started doing for the last 15 or 18 months is we run a quarterly NPS survey with our engineers and we get about a 50 or 60% response rate. And we're asking them one question with an empty text box to fill in. What do you recommend working in whichever one of the monoliths that Etsy that we're working in from one to 10? What we have been able to show to engineering leadership is a steady, upward trend in terms of people's satisfaction with working in the code base. So that has been a good measure of our success and the empty text box lets people give us very direct feedback about what is not working for them. And they do a bunch of grouping and correlation to figure out what might be the next area that we want to tackle.


Keyur Govande:

As an example, one thing that has come out of that is JavaScript build times. As a downside for the monolith, all of our JavaScript is in a handful of bundles and it has grown and we need to now be able to scale that build aspect of JavaScript to keep up with people's experience and expectation around, I'm going to save this file and I'm going to refresh my browser and it's going to show up. It's no longer that fast so we're putting in some effort to bring it back to keep the waiting aspect of this feedback loop really small.


Rebeca Parsons:

Well, thank you very much for taking the time to share your stories with our audience and thank you, Tim, for setting the overall context for us and helping us understand why we need to be thinking about things like not just developer experience, which you actually hear a great deal about, but the broader issue of how effective can a developer be in the organization and what can we do to improve that effectiveness. So thank you very much Kaer, Pia and Tim.

Check out the latest edition of the Technology Radar

More Episodes
Episode Name
Published