Macro trends in the tech industry | March 2022

Mike Mason

Published: March 29, 2022

We publish the Thoughtworks Technology Radar twice yearly, and it’s a detailed look at specific technologies and techniques that are on the path to industry adoption. As we put together the Radar we have a ton of interesting and enlightening conversations discussing the context of the ‘blips’ but not all this extra information fits into the radar format. These “macro trends” articles allow us to add a bit of flavor and to zoom out and see the wider picture of what’s happening in the tech industry.

The ongoing tension between client and server-based logic

Long industry cycles tend to cause us to pendulum back and forth between a ‘client’ and ‘server’ emphasis for our logic. In the mainframe era we had centralized computing and simple terminals so all the logic — including where to move the cursor! — was handled by the server. Then came Windows and desktop apps which pushed more logic and functionality into the clients, with “two-tier” applications using a server mostly as a data store and with all the logic happening in the client. Early in the life of the internet, web pages were mostly just rendered by web browsers with little logic running in the browser and most of the action happening on the server. Now with web 2.0 and mobile and edge computing, logic is again moving into the clients.

On this edition of the radar a couple of blips are related to this ongoing tension. Server-driven UI is a technique that allows mobile apps to evolve somewhat in between client code updates, by allowing the server to specify the kinds of UI controls used to render a server response. TinyML allows larger machine learning models to be run on cheap, resource-constrained devices, potentially allowing us to push ML to the extreme edges of the network.

The take-away here is not that there’s some new ‘right’ way of structuring a system’s logic and data, rather that it’s an ongoing tradeoff that we need to constantly evaluate. As devices, cloud platforms, networks and ‘middle’ servers gain capabilities, these tradeoffs will change and teams should be ready to reconsider the architecture they have chosen.

“Gravitational” software

While working on the radar we often discuss things that we see going badly in the industry. A common theme is over-use of a good tool, to the point where it becomes harmful, or of using a specific kind of component beyond the margins in which it’s really applicable. Specifically, we see a lot of teams over-using Kubernetes — “Kubernetes all the things!” — when it isn’t a silver bullet and won’t solve all our problems. We’ve also seen API gateways abused to fix problems with a back-end API, rather than fixing the problem directly.

We think that the “gravity” of software is an explanation for these antipatterns. This is the tendency for teams to find a center of gravity for behavior, logic, orchestration and so on, where it’s easier or more convenient to just continue to add more and more functionality, until that component becomes the center of a team’s universe. Difficulties in approving or provisioning alternatives can further lead to inertia around these pervasive system components.

The industry’s changing relationship to open source

The impact of open source software on the world has been profound. Linux, started by a young programmer who couldn’t afford a commercial Unix system but had the skills to create one, has grown to be one of the most used operating systems of our time. All the top 500 supercomputers run on Linux, and 90% of cloud infrastructure uses it. From operating systems to mobile frameworks to data analytics platforms and utility libraries, open source is a daily part of life as a modern software engineer. But as industry — and society at large — has been discovering, some very important open source software has a bit of a shaky foundation.

It takes nerves of steel to work for many years on hundreds of thousands of lines of very complex code, with every line of code you touch visible to the world, knowing that code is used by banks, firewalls, weapons systems, web sites, smart phones, industry, government, everywhere. Knowing that you’ll be ignored and unappreciated until something goes wrong.

Steve Marquess

Heartbleed was a bug in OpenSSL, a library used to secure communication between web servers and browsers. The bug allowed attackers to steal a server’s private keys and hijack user’s session cookies and passwords. The bug was described as ‘catastrophic’ by experts, and affected about 17% of the internet’s secure web servers. The maintainers of OpenSSL patched the problem less than a week after it was reported, but remediation also required certificate authorities to reissue hundreds of thousands of compromised certificates. In the aftermath of the incident it turned out that OpenSSL, a security-critical library containing over 500,000 lines of code, was maintained by just two people.

Log4Shell was a recent problem with the widely-used Log4j logging library. The bug enabled remote access to systems and again was described in apocalyptic terms by security experts. Despite the problem being reported to maintainers, no fix was forthcoming for approximately two weeks, until the bug had started to be exploited in the wild by hackers. A fix was hurriedly pushed out, but left part of the vulnerability unfixed, and two further patches were required to fully resolve all the problems. In all, more than three weeks elapsed between the initial report and Log4j actually having a fully secure version available.

I think it’s important to be very clear that I’m not criticizing the OpenSSL and Log4j maintenance teams. In the case of Log4j, it’s a volunteer group who worked very hard to secure their software and gave up evenings and weekends for no pay and who had to endure barbed comments and angry Tweets while fixing a problem with an obscure Log4j feature that no person in their right mind would actually want to use but only existed for backwards-compatibility reasons. The point remains, though: open source software is increasingly critical to the world but has widely varying models behind its creation and maintenance.

Open source exists between two extremes. Companies like Google, Netflix, Facebook and Alibaba release open source software which they create internally, fund its continued development, and promote it strongly. I’d call this “professional open source” and the benefit to those big companies is largely about recruitment — they’re putting software out there with the implication that programmers can join them and work on cool stuff like that. At the other end of the spectrum there is open source created by one person as a passion project. They’re creating software to scratch a personal itch, or because they believe a particular piece of software can be beneficial to others. There’s no commercial model behind this kind of software, no-one is being paid to do it, but the software exists because a handful of people are passionate about it. In between these two extremes are things like Apache Foundation supported projects, which may have some degree of legal or administrative support, and a larger group of maintainers than the small projects, and “commercialized open source” where the software itself is free but scaling and support services are a paid addon.

This is a complex landscape. At Thoughtworks we use and advocate for a lot of open source software. We’d love to see it better funded but, perversely, adding explicit funding to some of the passion projects might be counterproductive — if you work on something for fun because you believe in it, that motivation might go away if you were being paid and it became a job. We don’t think there’s an easy answer but we do think that large companies leveraging open source should think deeply about how they can give back and support the open source community, and they should consider how well supported something is before taking it on. The great thing about open source is that anyone can improve the code, so if you’re using the code, also consider whether you can fix or improve it too.

Securing the software supply chain

Historically there’s been a lot of emphasis on the security of software once it’s running in production—is the server secure and patched, does the application have any SQL injection holes or cross-site scripting bugs that could be exploited to crack into it? But attackers have become increasingly sophisticated and are beginning to attack the entire “path to production” for systems, which includes everything from source-control to continuous delivery servers. If an attacker can subvert the process at any point in this path, they can change the code and intentionally introduce weaknesses or back doors and thus compromise the running systems, even if the final server on which it’s running is very well secured.

The recent exploit for Log4j, which I mentioned in the previous section on open source, shows another vulnerability in the path to production. Software is generally built using a combination of from-scratch code specific to the business problem at hand, as well as library or utility code that solves an ancillary problem and can be reused in order to speed up delivery. Log4Shell was a vulnerability in Log4j, so anyone who had used that library was potentially vulnerable (and given that Log4j has been around for more than a decade, that could be a lot of systems). Now the problem became figuring out whether software included Log4j, and if so which version of it. Without automated tools, this is an arduous process, especially when the typical large enterprise has thousands of pieces of software deployed.

The industry is waking up to this problem, and we previously noted that even the US White House has called out the need to secure the software “supply chain.” Borrowing another term from manufacturing, a US executive order directs the IT industry to establish a software “bill of materials” (SBOM) that details all of the component software that has gone into a system. With tools to automatically create an SBOM, and other tools to match vulnerabilities against an SBOM, the problem of determining whether a system contains a vulnerable version of Log4J is reduced to a simple query and a few seconds of processing time. Teams can also look to Supply chain Levels for Software Artifacts (SLSA, pronounced ‘salsa’) for guidance and checklists.

For more insights into the challenges around securing the software supply chain, why not check out our podcast episode on the subject.

The demise of standalone pipeline tools

“Demise” is certainly a little hyperbolic, but the radar group found ourselves talking a lot about Github Actions, Gitlab CI/CD, and Azure Pipelines where all the pipeline tools are subsumed into either the repo or hosting environment. Couple that with the previously-observed tendency for teams to use the default tool in their ecosystem (Github, Azure, AWS, etc) rather than looking at the best tool, technique or platform to suit their needs, and some of the standalone pipeline tools might be facing a struggle. We’ve continued to feature ‘standalone’ pipeline tools such as CircleCI but even our internal review cycle revealed some strong opinions, with one person claiming that Github Actions did everything they needed and teams shouldn’t use a standalone tool. Our advice here is to consider both ‘default’ and standalone pipeline tools and to evaluate them on their merits, which include both features and ease of integration.

SQL remains the dominant ETL language

We’re not necessarily saying this is a good thing, but the venerable Structured Query Language remains the tool the industry most often reaches for when there’s a need to query or transform data. Apparently no matter how advanced our tooling or platforms are, SQL is the common denominator chosen for data manipulation. A good example is the preponderance of streaming data platforms that allow SQL queries over their state, or use SQL to build up a picture of the in-flight data stream, for example ksqlDB.

SQL has the advantage of having been around since the 1970s, with most programmers having used it at some point. That’s also a significant disadvantage — many of us learnt just enough SQL to be dangerous, rather than competent. But with additional tooling, SQL can be tamed, tested, efficient and reliable. We particularly like dbt, a data transformation tool with an excellent SQL editor, and SQLfluff, a linter that helps detect errors in SQL code.

The neverending quest for the master data catalog

A continuing theme in the industry is the importance and latent value of corporate data, with more use cases arising that can take advantage of this data, coupled with interesting and unexpected new capabilities arising from machine learning and artificial intelligence. But for as long as companies have been collecting data, there have been efforts to categorize and catalog the data and to merge and transform it into a unified format, in order to make it more accessible, more reusable, and to generally ‘unlock’ the value inherent in the data.

Strategy for unlocking data often involves creating what’s called a “master data catalog” — a top-down, single corporate directory of all data across the organization. There are ever more fancy tools for attempting such a feat, but they consistently run into the hard reality that data is complex, ambiguous, duplicated, and even contradictory. Recently the Radar has included a number of proposals for data catalog tools, such as Collibra.

But at the same time, there is a growing industry trend away from centralized data definitions and towards decentralized data management through techniques such as data mesh. This approach embraces the inherent complexity of corporate data by segregating data ownership and discovery along business domain lines. When data products are decentralized and controlled by independent, domain-oriented teams, the resulting data catalogs are simpler and easier to maintain. Additionally, breaking down the problem this way reduces the need for complex data catalog tools and master data management platforms. So although the industry continues to strive for an answer to ‘the’ master data catalog problem, we think it’s likely the wrong question and that smaller decentralized catalogs are the answer.

That’s all for this edition of Macro Trends. Thanks for reading and be sure to tune in next time for more industry commentary. Many thanks to Brandon Byars, George Earle, and Lakshminarasimhan Sudarshan for their helpful comments.

Disclaimer: The statements and opinions expressed in this article are those of the author(s) and do not necessarily reflect the positions of Thoughtworks.

Industries

Publications and Tools

All Insights