Buy now, pay laterOver the last few years, we’ve seen an increased emphasis on “developer experience” of tools and platforms. Whether it’s an IDE, an open-source library or some kind of cloud platform, the ease with which developers can use and adopt is increasingly important. This is a good thing and often a driving force for innovation. Treating developers as customers for your product can definitely give you a leg up.
We’re starting to see problems, however, when ease-of-use of a tool for one situation slides down a slippery slope into “mis-paradigming” for another use case. As an example, we have Productionizing Jupyter notebooks on Hold. We find Jupyter to be an excellent tool for analyzing data to reach conclusions and draw insights, but if you want to use Jupyter directly in production (rather than porting your code to a more production-oriented language such as Python), you may create significant pain for yourself.
Similarly, it’s easy to drop a file into an Amazon S3 bucket, hook that up to Simple Notification Service (SNS) and fire off a Llambda to process its contents, but the associated infrastructure tangle might cause more problems than it’s worth.
Really, this is the latest incarnation of the 80% rule and the need to understand it. Many tools will give you great acceleration for 80% of use cases. They just work. For the next 10%, you can probably still get it to do what you want. But for that final 10%, you may have worked yourself into a corner and are in a “pay later” state. Can you really work through the complexity of getting that final 10%? Many teams would do well to step back from the problem now and then and ask: should we work through the pain, or should we consider a different tool for our use-case?
The microservices operating modelMy erstwhile colleague, Aaron Erickson, used to describe microservices as the first true “cloud native” architecture, and we certainly think it’s a reasonable choice for modern systems. Teams should beware of microservice envy and understand that microservices trade reduced development complexity for increased operational complexity. That is to say, while each service is simpler, the choreography required to get them all working together is a lot harder than with a monolithic architecture.
With the continuing popularity of microservices, we are seeing a growing maturity around running microservices with ‘proper’ service management. Old operations techniques have been reinvented with new names such as “error budgets.” SLOs give teams clear metrics to use when determining if a “you build it, you run it” approach is working. Service meshes, such as Istio, directly support these service management concepts. Observability is a term on everyone’s lips, and teams are working hard to ensure the health of their systems are adequately visible to monitoring tools.
A true “microservices operating model” also encompasses shifts in team design and responsibilities and associated organizational impacts. These models are beginning to emerge, but we believe they should be founded on identifying a series of business and platform capabilities, then aligning long-lived teams that plan, build and run products to deliver those capabilities. We’d very much caution against the idea of creating a monolithic ‘platform’ team or simply hoping that shared business capabilities will emerge by chance and be curated by osmosis.
Relentless improvementIf there’s one thing that differentiates ‘good’ from ‘great’ technologists (no matter their specific area of expertise), it’s that people who are great at something are never happy with the current solution, and work to create a better answer. As an industry, we relentlessly improve on our old designs. There might be a good solution to a problem, but someone makes a better solution, and then we iterate.
These individual steps today lead to tomorrow’s innovations, which we as we’ve said before are fundamentally unpredictable (a core tenet of Evolutionary Architecture is to build systems knowing we can’t effectively predict these changes).
Pushing config too farOur Radar discussions included some significant time on the problem of “coding in configuration” and the slippery slope that seems to happen where a simple configuration language eventually grows and grows, and you end up with “Turing-complete XML” that’s very difficult to test and reason about. This seems to happen in lots of situations, from build files to templated Terraform configuration files. Teams often end up deciding it’s easier just to do tested code in a programming language rather than using a tool that we have twisted into being programmable config.
Kief Morris, author of Infrastructure as Code, thinks that the underlying problem is that we’ are failing to keep clear boundaries between what is configuration and what is logic. In his opinion, we shouldn’t be testing config files, and we shouldn’t be defining a target state with programmable logic. If you’re doing either of those, it’s a smell that you’ve got the boundaries wrong.
Morris suggests that declarative languages need to make it easier to extend them by adding logic in separate, testable components. He says: “If I want to add a new thing I can declare, extend the declarative language to add a definition, and then implement the ‘how’ logic that this definition triggers in a proper language, with tests.”
A general trend in the technology landscape is one of increasing overall complexity, and we see that with an increase in the orchestration needed between components. Luigi is a new tool in the data space that helps you build complex pipelines of batch jobs. Apache Airflow allows you to programmatically author, schedule, and monitor workflows. We now have all sorts of new things coming into our systems — functions-as-a-service, workflows, Python processes, and so on.
Orchestration complexity on the rise
Neal Ford, deliberately misquoting Dijkstra at our recent Radar meeting, says: “We’ve decoupled things so much we have almost ended up with a spaghetti of stuff, and now we need to put it together to make something useful.”
It’s important to remember that orchestration itself isn’t bad. It’s like coupling. You need a certain amount of it to actually get a system to function and provide value. Whether orchestration is bad or not depends on how pervasive it is within an application and how widespread it is.
Business Behavior DiffusionOrganizations running into problems pushing configuration too far and the complexity of orchestration are symptoms of a bigger trend: business logic is no longer confined to small chunks of code, it’s actually diffused throughout a system. Business logic is found in the way we configure Kubernetes pods, and in the way, we orchestrate our lambda functions. Scott Shaw puts it like this: “There’s an old fashioned notion that you have a ‘workload’ and you can move that workload to different places. There are responsibilities for writing the workload, hosting the workload, and the control-plane and application are neatly separated. And that’s just not true anymore.”
To be a bit more precise, business logic is arguably just the stuff that does calculations. But there are business-relevant decisions to be made. What should we do if we need to scale in this manner? How should we degrade under load? Those are questions for a product owner, and they are business logic, but not in the traditional sense. We settled on the phrase “business behavior” to encompass these kinds of characteristics.
Our advice is for teams first to appreciate that this diffusion has occurred. In old-school architecture, everything significant would have been in the business-domain layer. But that’s just gone now, the equivalent logic smeared across a combination of infrastructure, configuration, microservices, and integrations. Event-driven, serverless functions, files in an S3 bucket triggering a lambda — it’s very hard to follow the flow of logic throughout a system. Teams should strive to use patterns that make overall behavior easier to understand, possibly even by (gasp!) reducing the number of technologies they use in any one system.