Cloud: beyond infrastructure thinking

Scott Shaw

Published: April 22, 2020

Learning from past mistakes

It might seem odd to give advice on cloud adoption at this point in the hype cycle. After all, much of the industry considers public cloud a settled area with established practices and success stories. Surely there’s more than enough advice to help newcomers plan and execute their cloud transition? Despite all the available information, I still see organisations struggling along the way on their cloud journey. Many well-intentioned cloud migration programs sputter and stall in their attempt to turn the promise of on-demand elastic compute into concrete, measurable benefits to their business.

Sure, we have shelves of literature describing how to architect cloud systems “well” and how to securely engineer complex compute environments and network topologies. But there’s still very little information on how to structure your organisation, to employ operating models that embrace the potential of extreme virtualization. Finally, I see businesses busily ignoring the lessons of the past and creating fleets of assets that will be, at best, costly to maintain and at worst, unmaintainable. In their haste to move to the cloud, organisations are accruing technical debt faster than they can pay it down. They’re creating one-off, hard-to-replicate environments and failing to implement even the most basic quality checks on those assets. In the long run, these assets will consume far more in maintenance costs than will ever be saved in moving from an on-premise data center to a public cloud provider.

In this article I’ll try to point out some of the common pitfalls I’ve seen companies make in their move to the cloud and then humbly offer an alternative. I’d like to encourage a cloud adoption approach that takes more than infrastructure into account. I’m of the firm opinion that the shift to public cloud is so profound that we need to change the entire business to adapt. We can’t apply the same culture, organisational structures, roles, responsibilities, engineering practices and vendor choices that have worked (or not) for us in the past. All of these need to evolve beyond the days when software was hosted in a physical facility built and maintained by a separate group of infrastructure specialists. Literally, this split still exists, but the power we now have to completely reconfigure our virtual data centers with a few lines of code demands a different approach.

Why is cloud success so elusive?

I’ve seen this pattern play out a number of times over the course of my career. A new technology comes along, it promises to be the wave of the future. Industry analysts draw quadrants and select winners and businesses assume that by going with a winner, they can avoid the hard work of learning a new discipline. They assume that by selecting the right IT outsourcing partner and managing the costs closely, the business benefits will naturally accrue. This has never been the case. Businesses that succeed with new technology do so by adapting their organisation, their operating model and their engineering capabilities to the new paradigm. In many cases they also need to adapt their business model as well.

In the case of public cloud, one way this fallacy manifests is in the belief that cloud is simply an alternate form of infrastructure. Cloud becomes the responsibility of the infrastructure or operations division who previously managed data centers and physical networks. Typically, these teams manage capital assets as a service to the rest of the organisation. When infrastructure teams become the custodians of corporate cloud resources, those facilities become — in effect — an extension of the existing infrastructure. This approach is widespread and often results in “hybrid” cloud implementations where the public compute resources seamlessly (in theory) extend the on-premise assets.

While this approach might make intuitive sense and works — to a degree, it doesn’t acknowledge or accomodate the potential revolution in technology promised to the businesses by cloud pundits and analysts. In organisations where cloud is managed by an infrastructure team, environments tend to be doled out like a scarce, capital resource. Often, digital delivery teams have to request a cloud environment via a support ticket and wait for the environment to be manually or semi-automatically provisioned. This makes no sense in a cloud-native world, where resources are abundant and can be provisioned with a line of code. If the concern is security or compliance, there are ways to address those risks without giving up self-service access to cloud environments on demand.

One of the fundamental misunderstandings underlying this approach is the belief that cloud is just a virtual form of hardware infrastructure. But cloud isn’t hardware. Rather, it’s 100% software.

The same businesses that hand over cloud implementation to an infrastructure or operations team would never dream of handing that same team a major, multi-year software development project. But that’s what cloud implementation is: a major, multidisciplinary software delivery effort. Nobody would undertake even a medium-size software development project these days without things like a product owner, experience designers, experienced software tech leads, automated testing, and quality assurance. But I’ve seen countless cloud migration projects executed with none of this software professionalism. Sometimes a DevOps team (an oxymoron) is involved but this is still seen as a cloud specialist team, not a general software delivery team.

Another common pitfall that businesses encounter in their rush to move to the cloud — and which constrains their long-term success — is to become deeply and inextricably entangled with a single cloud vendor. Again, the industry has a long history of experience in managing IT outsourcing and most experts agree on the need to maintain control and flexibility over outsourced service arrangements.

I recall working with a large financial services company a number of years ago to help it move away from its chosen database vendor. The open-source database alternatives had matured to the point where they equaled or surpassed the vendor’s products, commercial support was available for the open-source platform and the commercial database vendor was becoming less responsive to the customer’s needs over time. Sounds easy, right? After all SQL is a standard. However, the program was long, expensive and in the end, didn’t actually enable the customer to retire the database vendor’s products entirely.

The reasons are familiar to anyone who has worked in this field. Over the years, the vendor had introduced innumerable “time-saving” proprietary features into their base product. DBAs succumbed to the temptation to use these features and implemented bits and pieces of business logic directly in the database platform without the usual software engineering safeguards. They also took advantage of proprietary SQL extensions that made it easier to store and retrieve things like time series and images. But the real portability killer came with the hidden integration to other products within the vendor’s substantial suite. Integration adapters with proprietary point-and-click configuration, heavy dependence on the vendor’s optimized storage appliances and automatic integration with the vendor’s identity and access management platform. All of this entanglement became slow and tedious to unpick and of course, the vendor’s consultants, who had been so helpful in adopting the new features, had no advice to offer when moving away from the products. In the end the customer just threw up their hands and accepted that some systems would just have to live out their lifetime until no more end users depended on them.

Although this is a familiar story, the lessons learned seem to be lost on the current generation of organisations rushing to pledge their allegiance to one cloud vendor or another. Partly this is due to the market domination of a single vendor who more or less invented the idea of cloud computing and whose offerings far outpace their competitors. But we are at a point today where businesses seeking public cloud services have a choice of vendors. The three major vendors have achieved relative parity across a fundamental set of services and are competing fiercely to capture a larger portion of the market. Their pricing, the services they offer and their business practices all encourage cloud customers to consume ever more services and to take greater advantage of the vendor’s unique, differentiating features. These proprietary features are inherently difficult to port from one vendor to another. Sometimes this is to the consumers’ advantage. Cloud vendors are outdoing one another in providing a seamless, low-friction developer experience. But single-vendor entanglement also carries risk — primarily by moving control over IT decision-making away from the business and into the hands of the vendor.

I see business after business become ever more dependent on their chosen cloud vendor, without ever really making a conscious decision to do so.

Many of these businesses are now effectively engaged in a deep, de facto business partnership with their cloud vendor. Their business success is now strongly dependent on the success of the vendor. Unfortunately, this dependence is seldom reciprocal.

You might argue that the established cloud vendors are far too large to ever go out of business and that this risk of failure is too remote to warrant forgoing the amazing productivity benefits of deep vendor entanglement. However, there are any number of ways in which a business partnership can disintegrate. Personality clashes, conflicts of interest, billing disputes, security concerns, etc. could all result in a need to move away from a cloud vendor. What happens when one of these situations arises and you realise it is too costly to move your core business assets away from your former vendor/partner? Of course this situation isn’t unique to the cloud business. I’ve worked through many IT replatforming initiatives where the enormous but inevitable cost of exiting a platform was never anticipated or incorporated into the business case. But the lessons of the past seem particularly difficult to remember when confronted with a technology shift as profound and potentially beneficial as cloud computing.

Adopting cloud holistically; four dimensions

I’ve noticed that the most successful cloud adoption strategies go beyond replatforming and instead incorporate a 360-degree approach to change. You might be lucky enough to lower costs by considering infrastructure alone but to see an impact on top-line revenue, market share or innovation, your response must be broader. Shifting from on-premise infrastructure to cloud hosting is only the first step. Maximising the business benefit from a move to cloud requires attention to the following four dimensions:

Technical excellence. Cloud is software and effectively leveraging cloud requires an investment in software engineering capability. Cloud-native engineering is more than simply leveraging a vendor’s services, it requires rethinking some common assumptions about resilience and application design. Infrastructure code demands the same attention to quality as you would apply to customer-facing systems. Successful software delivery efforts should deliver business value early and often. Cloud implementation is no different.

Autonomy with alignment. Moving from a world of physical resources — maintained by specialists and housed in a remote colocation facility — to software-defined network, compute, storage and support is a significant organisational disruption. Delivery teams are now responsible for defining and maintaining their own hosting environments. Security, compliance, strategic alignment and support responsibilities all shift to the left.

Self-service platforms. Cloud vendors — through their APIs and broad education efforts — have democratised access to hosting environments. This ease of access must be passed along to corporate developers if corporations are expecting to see the step change in productivity that cloud affords. Internal, self-service cloud platforms should be optimised to roll out customer-centric digital products at an accelerating pace.

Vendor choice. Every business needs a multi-cloud strategy. That strategy might be to accept the risk of single-vendor dependency in exchange for developer productivity and favourable pricing. But this must be a deliberate choice rather than a series of unconscious actions. It also pays to be aware of the unique services each vendor offers so that you can choose from a palette of options, rather than being tied to a single vendor for all your needs.

Technical excellence

The ability to provision resources and manage their operation entirely through APIs is a defining characteristic of public cloud. This not only provides an interface through which we can define pretty much anything in the most complex hosting environments, it does it in a uniform, consistent way. The effect this has on the way we think about these resources is profound. Someone, somewhere, sometime must eventually rack servers, run cables, configure switches, build kernels, etc., but for most of us, creating a hosting environment for our applications is entirely a software development activity that can be done from our desks. Admittedly, there’s a vast body of knowledge over and beyond ordinary programming that must be mastered to do this well, but most infrastructure creation is now fundamentally a software development activity.

As a software development activity, it requires certain discipline around code modularity, coupling, clarity of intent, and testability. These practices are vital because the valuable corporate asset being created is no longer the infrastructure itself. Instead, it’s the code that defines the infrastructure. If that code’s not written and evolved according to established engineering practices, it will be unsustainably costly to maintain.

Autonomy with alignment

Several organisational factors affect the success of a move to public cloud. These include:

The impact of shifting hosting decisions from infrastructure specialists to ordinary software developers
Dealing with shortages of cloud skills or a concentration of those skills into the hands of a few experts can inhibit cloud success
Lack of transparency which can result in cloud costs ballooning unexpectedly
How cloud usage is accounted for. Metered usage isn’t really a capital expense. Cost accounting for public cloud is quite different from fixed, on-premise infrastructure and needs to include savings realised by retiring legacy infrastructure.
Measuring impact properly. Useful tools and a compelling developer experience are more effective governance practices than tollgates and review boards.

The software-heavy nature of public cloud management, and the ability for developers to define and provision their own hosting environments shift more of the hosting decisions into the teams that deliver software and away from traditional operations or infrastructure teams. Businesses derive the most value from public cloud when they’re able to organise their digital delivery around autonomous teams with long-term, sticky responsibility for operating, supporting, maintaining and evolving the products they build. These teams must be empowered to make their own hosting decisions and then held accountable for living with and supporting those decisions.

Of course, empowering teams to build, own and operate their own assets doesn’t mean they’re completely free of governance or alignment to overall business goals. Cost management, staffing flexibility, predictability, security and compliance are all reasons to have an overarching vision and architectural decision making framework to govern public cloud usage. This organisation-wide architectural direction should equip teams on the ground with the tools and knowledge to achieve a standard of technical excellence and to make good decisions about which cloud services to use — and which not to use. This organisation should also provide adequate guidelines and standards so that teams can build hosting environments that are inherently secure and compliant. This might include automated infrastructure scanning tools, approved artefacts, preconfigured network topologies or software supply chain verifications.

Self-service platforms

Long before enterprises began considering public cloud as a viable option to host their critical business applications, it caught the imagination and enthusiasm of software developers everywhere. Much of the practice and tooling for defining public cloud infrastructure as code was invented by the open-source community in response to what was, at the time, a revelatory new ability to manipulate arbitrarily-complex infrastructure setups entirely through APIs. This capability put power and productivity directly into the hands of developers in a way they’d never experienced before. This newfound productivity, in turn, led to many of the digital darling startups that pioneered new business models over the last 10-15 years. This self-service infrastructure on-demand is one of the key prerequisites in the very definition of public cloud. So it’s ironic that when many enterprises adopt public cloud, the first thing they do is remove this self-service capability from developers’ hands.

However, these restrictions aren’t put into place maliciously. Businesses need some way to impose consistency, assure compliance and prevent runaway costs of cloud environments.

Vendor choice

The industry now has a handful of public cloud vendors to choose from, all of whom have a relatively similar set of basic services on offer. This presents a problem for cloud vendors in that their primary product is really a commodity that can be differentiated only on price. There’s nothing proprietary built in to the basic cloud infrastructure as a service (IaaS) model. The vendors’ response to this conundrum is to offer an ever-expanding variety of proprietary, differentiating services. While many of these services are quite attractive and can lead to enormous cost savings and enhanced developer productivity, their use creates a strong affinity for, or dependency on, a single provider. If this is a conscious choice made rationally and intentionally then it may very well be appropriate. However, my observation is that the entanglement with a single cloud provider usually happens incrementally and unconsciously.

By the time a customer realises they no longer have a choice of cloud providers it’s too costly and time-consuming to port their most critical assets to another vendor.

The primary question a business must answer for every cloud workload is: how deep should the entanglement be? At one extreme, there are intentionally short-lived applications that need to be implemented and deployed very quickly. These might be small experiments that don’t warrant much up-front investment. For those assets, it makes sense to take advantage of every time- and cost-saving proprietary feature the vendor has to offer. For these apps there’s no hidden portability cost since the application will be retired before a change of vendors is necessary. On the other end of the spectrum, core systems that support essential business processes and might be systems of record are usually long-lived. These systems might be maintained for 10 years or longer. For these systems, the risk of having to change cloud vendors at some during the asset’s lifespan is much greater and should be taken into account when deciding which vendor services to use.

If the price of portability isn’t accounted for in the total lifecycle costs of the solution then there will be an unplanned ballooning payment due at replatforming time. Gregor Hohpe likens the upfront cost of portability to buying options. If you judge it extremely unlikely to ever need to exit your cloud provider, the value of the option should be low. However, if your assets are long-lived, the risk goes up and the value of an option to lower porting costs goes up proportionally. In other words, it sometimes lowers the total cost of ownership if you avoid the time-saving-but-proprietary services that make it difficult to port from one cloud vendor to another. In any case this needs to be a deliberate decision made before embarking on a major move to the cloud.

Keeping your eyes open

Cloud has become ubiquitous within the enterprise and with good reason. Done well, cloud is an enabler of the modern digital business. But as we’ve seen, there are still challenges. By taking a 360-degree approach to cloud adoption, you’ll be able to dive into cloud with a better appreciation of those challenges. That isn’t to say your cloud strategy will become risk free. Rather, you’ll have a better idea of whether the risks associated with your approach are ones you’re willing to take.

Industries

Publications and Tools

All Insights