In each edition of the Radar we include a handful of “themes” that help the reader understand larger trends among the details of the individual blips. We only have limited space and often there are more trends than we have room for. These “macro trends” articles expand upon each Radar with a few more stories from the tech industry. Read on for the bigger picture.
The concept of the monorepo — a single large source code repository that stores code and assets across all projects within an organization — has gained traction over the last few years, with Google popularizing the approach. The main benefits are to enable easier sharing of source code across teams, and even to make refactoring easier across the company-wide code base. But Google’s success with the technique doesn’t mean it’s replicable by everyone else in the industry. A single build process across everything in the repo is often cited as a key success factor, but this takes discipline and engineering effort to achieve.
We’re now seeing organizations where there’s a desire to create multiple “mono” repos within an organization. This is sensible, but it does compromise the value of a monorepo and will need careful thought as to what assets live in which repository. Advice in this area is difficult to give — the consulting classic of “it depends” applies here — but if your organization is enticed by the idea of a monorepo, you should definitely think twice and examine whether you have the use cases, organizational structure and engineering tooling required to make the approach pay off: not every organization is Google.
Evolving and capitalizing on platform teams
It’s obvious that good use of platforms — whether they’re at an infrastructure, data, API or business capability level — can accelerate getting products and features into production. Everyone recognizes the value in a platform, the question is how best to build one. Increasingly, the answer is to form a platform team who are tasked with creating a compelling offering that other teams can use to accelerate their work. The concept of the platform team is maturing and the industry is discovering both good and bad ways for this to work:
It’s critical to have development teams as customers for the platform and to consider it a product — that means funding it for the long term, thinking carefully about the value it creates, understanding the use-cases, and not building in a vacuum with a Field of Dreams “build it and they will come” mentality. Building a brand for your platform and working to ensure that teams can adopt it and get value out of it is important too.
Structuring teams across and around a platform requires some careful thinking. We’re big fans of the book Team Topologies which provides a structured approach to organizing teams and feeding learnings back into a continuous improvement cycle.
New patterns are emerging such as a “technology domain” oriented team combining old and new tooling and allowing them to make improvements and upgrades in an efficient, controlled manner. As an example, a team might be in charge of pipeline tooling for the whole organization, allowing them to optimize migration paths and ensure security and compliance controls. This contrasts with an approach of bringing in a new team to write a new tool, which doesn’t access all the knowledge and capabilities in the original technology team.
There is some tension here between using a platform team approach vs. single teams owning both application and infrastructure. At a smaller scale the single team strategy works well, but as you scale it becomes impractical since you can’t really embed knowledge of all the platform concerns into a large number of application teams. As an organization scales further, it might become sensible to create multiple platform teams, each owning different parts of the technology domain.
Unfortunately, we’re seeing some organizations put their ‘platforms’ behind a bad-old-days ticketing system. This just doesn’t work: it increases friction and teams can’t self-serve or manage their code on the journey into production.
Platform teams can work with engineering centres of excellence to define good patterns and act as an enabling force to help teams incorporate newer cloud-native tech. This is a much better alternative than teams each needing to spend time learning these patterns on their own, and we strongly recommend the platform team concept to our clients.
Consolidation of all-in-one delivery infrastructure threatens innovation
Back in the days of traditional on-premise client/server software, you might hear that a company is “a Microsoft shop” and uses that set of tooling for everything. So a team might end up using Visual Studio for development and SQL Server as the database, even if an Oracle database had certain features that could have made it a better choice for a particular use-case (I’m not knocking SQL Server here or promoting Oracle, just noting that those two particular databases were often competing against each other). Being “a Microsoft shop” was a valid decision and offered clarity in tech stack choices, consolidation of developer and operations skills, and often an easier path through procurement — everyone had a Microsoft licensing deal and these tools could be added to that.
Today, large cloud platforms such as AWS, GCP and especially Azure offer “front to back” cloud application development lifecycle tools including requirements management, source control, build pipelines, production deployment, and trouble ticketing, in an analogous “stack” but for the cloud age. This makes the default choice easy to understand and procure, providing a team all the tools they need to get software into production. The benefits are similar to those you might have achieved from picking a single tech stack in the 2000s.
The problem we see is that these “good enough” choices may lag behind an industry-leading independent alternative. That threatens overall innovation, since the defaults are now owned and controlled by big cloud vendors. And teams often accept the default choice since it (mostly) works well enough and fighting through procurement or approval processes for a different option just isn’t worth it. As one of the Radar authors said in our discussion, “when all you have is GitHub, the whole world looks like a pull request.”
The fight for cloud market share motivates the big providers to continue to create these kinds of outcomes. Pseudo “walled gardens” for each cloud increase lock-in and the cost of switching. The stakes are high — in its race to capture market share, Google’s Cloud division lost $5.6bn in 2020, money it clearly intends to earn back over the long term through customer stickiness. Our advice here is to go into this with eyes open. Understand that there’s a possible price to pay for convenience but accept that, if the benefits of expediency are worth it.
The fall and rise of SQL
Before the “big data” trend, relational databases were heavily optimized for on-disk storage, were centralized and scaled vertically, and included a tightly integrated query engine. Database operations favored writes, with little duplication but lots of join activity on reads. If your database wasn’t fast enough, you bought a bigger box, and the integrated query language amplified all these characteristics.
These design decisions — which had served us well for decades of data storage — were at odds with the hardware innovations happening at the time. Disks were getting cheaper and with increased capacity, solid-state drives were entirely changing the performance characteristics of storage, and CPU core count was increasing rather than clock speed. This led to the creation of the Hadoop file system and ignited a Cambrian explosion in data technologies.
Suddenly, data could live outside relational databases. The “NoSQL” movement popularized many types of datastore such as key/value, document, and graph databases. Data could even live outside of a database inside a data lake or an S3 bucket. Duplication of data was considered acceptable in order to optimize read performance, as more and more use-cases began to require “web scale” application and database performance. And we introduced the notion of a decoupled query engine, with standalone query languages that could plug into different storage systems.
These distributed data stores weren’t without their challenges, and while some organizations were grappling with all the moving pieces and the general difficulty of a model that wasn’t simple “atomic commit” the traditional database industry was learning and improving. Many SQL databases added functionality, such as Postgres supporting a key/value store. Storage engines improved with hybrid support for both row and column based storage, giving greater flexibility for both transactional and analytic workloads.
Then came cloud, with vendors offering cloud-scale databases — BigQuery, RedShift, Snowflake and the like — that offer both scalability and more traditional “SQL-like” experiences for developers. This led to a resurgence in SQL-based tooling. Tools like dbt now allow us to solve large and complex problems with small independent units of SQL that can be tested and composed together in a pipeline.
So where are we today? With ELT on cloud data warehouses and tools like dbt enabling transformations in the warehouse, there is a tension with data lakes. There’s often a ‘slimming’ of the lake and a resurgence of warehouses on the periphery. A related concern around the decentralization of data and democratization of access — often solved with an approach such as Data Mesh — may currently be better served by the cloud data warehouses than the lake ecosystem. And we think that the next frontier is to merge OLTP and OLAP workloads into the same DB, and neither the cloud data warehouses nor the data lake ecosystem currently has a good answer for that.
Mainstream machine learning
You’d have to be living under a rock not to realize that machine learning is important in almost every type of software today, and indeed the industry has been talking about it for years. But we may have hit the inflection point where ML becomes just a standard thing that we all do.
For example, organizations should use CD4ML to put machine learning models into production. It should be a continuous process just like all the other aspects of software creation. Our colleagues at Fourkind are bullish on organizations’ ability to own their own ML models — while many of the cloud providers offer AI as a service, it’s actually not that difficult to purpose-build a model for your own use case. This has the benefits of ownership and control of the model, algorithms and training data, and not being beholden to a cloud provider for access to theirs. The Radar also includes a blip on the simplest possible ML model, a suggestion that we shouldn’t always leap to deep neural networks to solve problems. We also discussed the fact that with modern Kubernetes infrastructure we can often train a useful model simply using K8s or ECS instead of dedicated ML environments. So it certainly seems like ML is mainstream in the industry, and if your organization is still treating it with undue reverence or separation from your processes, it’s worth looking to integrate more tightly.
That wraps things up for this edition of macro trends. I’d like to thank my fellow ‘Doppler’ members for their help with the article, and in particular Brandon Byars, Ian Cartwright, Ranbir Chawla, Scott Davies, Chris Ford, Kief Morris, Rebecca Parsons, Luciano Ramalho, Bharani Subramaniam and Lakshminarasimhan Sudarshan for their thoughts and suggestions.
Disclaimer: The statements and opinions expressed in this article are those of the author(s) and do not necessarily reflect the positions of Thoughtworks.