The API expand-contract pattern, sometimes called parallel change, will be familiar to many, especially when used with databases or code; however, we only see low levels of adoption with APIs. Specifically, we're seeing complex versioning schemes and breaking changes used in scenarios where a simple expand and then contract would suffice. For example, first adding to an API while deprecating an existing element, and then only later removing the deprecated elements once consumers are switched to the newer schema. This approach does require some coordination and visibility of the API consumers, perhaps through a technique such as consumer-driven contract testing.
We see continuous delivery for machine learning (CD4ML) as a good default starting point for any ML solution that is being deployed into production. Many organizations are becoming more reliant on ML solutions for both customer offerings and internal operations so it makes sound business sense to apply the lessons and good practice captured by continuous delivery (CD) to ML solutions.
As application development becomes increasingly dynamic and complex, it's a challenge to deliver accessible and usable products with consistent style. This is particularly true in larger organizations with multiple teams working on different products. Design systems define a collection of design patterns, component libraries and good design and engineering practices that ensure consistent digital products. Built on the corporate style guides of the past, design systems offer shared libraries and documents that are easy to find and use. Generally, guidance is written down as code and kept under version control so that the guide is less ambiguous and easier to maintain than simple documents. Design systems have become a standard approach when working across teams and disciplines in product development because they allow teams to focus. They can address strategic challenges around the product itself without reinventing the wheel every time a new visual component is needed.
As noted in one of the themes for this edition, the industry is increasingly gaining experience with platform engineering product teams that create and support internal platforms. These platforms are used by teams across an organization and accelerate application development, reduce operational complexity and improve time to market. With increasing adoption we're also clearer on both good and bad patterns for this approach. When creating a platform, it’s critical to have clearly defined customers and products that will benefit from it rather than building in a vacuum. We caution against layered platform teams that simply preserve existing technology silos but apply the "platform team" label as well as against ticket-driven platform operating models. We're still big fans of using concepts from Team Topologies as we think about how best to organize platform teams. We consider platform engineering product teams to be a standard approach and a significant enabler for high-performing IT.
We strongly advise organizations to make sure, when they really need to use cloud service accounts, that they are rotating the credentials. Rotation is one of the three R's of security. It is far too easy for organizations to forget about these accounts unless an incident occurs. This is leading to accounts with unnecessarily broad permissions remaining in use for long periods alongside a lack of planning for how to replace or rotate them. Regularly applying a cloud service account rotation approach also provides a chance to exercise the principle of least privilege.
As the cloud is becoming more and more a commodity and being able to spin up cloud sandboxes is easier and available at scale, our teams prefer cloud-only (as opposed to local) development environments to reduce maintenance complexity. We're seeing that the tooling to do local simulation of cloud-native services limits the confidence in developer build and test cycles; therefore, we're looking to focus on standardizing cloud sandboxes over running cloud-native components on a developer machine. This will drive good infrastructure-as-code practices as a forcing function and good onboarding processes for provisioning sandbox environments for developers. There are risks associated with this transition, as it assumes that developers will have an absolute dependency on cloud environment availability, and it may slow down the developer feedback loop. We strongly recommend you adopt some lean governance practices regarding standardization of these sandbox environments, especially with regard to security, IAM and regional deployments.
Contextual bandits is a type of reinforcement learning that is well suited for problems with exploration/exploitation trade-offs. Named after "bandits," or slot machines, in casinos, the algorithm explores different options to learn more about expected outcomes and balances it by exploiting the options that perform well. We've successfully used this technique in scenarios where we've had little data to train and deploy other machine-learning models. The fact that we can add context to this explore/exploit trade-off makes it suitable for a wide variety of use cases including A/B testing, recommendations and layout optimizations.
When building Docker images for our applications, we're often concerned with two things: the security and the size of the image. Traditionally, we've used container security scanning tools to detect and patch common vulnerabilities and exposures and small distributions such as Alpine Linux to address the image size and distribution performance. But with rising security threats, eliminating all possible attack vectors is more important than ever. That's why distroless Docker images are becoming the default choice for deployment containers. Distroless Docker images reduce the footprint and dependencies by doing away with a full operating system distribution. This technique reduces security scan noise and the application attack surface. Moreover, fewer vulnerabilities need to be patched and as a bonus, these smaller images are more efficient. Google has published a set of distroless container images for different languages. You can create distroless application images using the Google build tool Bazel or simply use multistage Dockerfiles. Note that distroless containers by default don't have a shell for debugging. However, you can easily find debug versions of distroless containers online, including a BusyBox shell. Distroless Docker images is a technique pioneered by Google and, in our experience, is still largely confined to Google-generated images. We would be more comfortable if there were more than one provider to choose from. Also, use caution when applying Trivy or similar vulnerability scanners since distroless containers are only supported in more recent versions.
The group behind Ethical OS — the Omidyar Network, a self-described social change venture created by eBay founder Pierre Omidyar — has released a new iteration called Ethical Explorer. The new Ethical Explorer pack draws on lessons learned from using Ethical OS and adds further questions for product teams to consider. The kit, which can be downloaded for free and folded into cards to trigger discussion, has open-ended question prompts for several technical "risk zones," including surveillance ("can someone use our product or service to track or identify other users?"), disinformation, exclusion, algorithmic bias, addiction, data control, bad actors and outsized power. The included field guide has activities and workshops, ideas for starting conversations and tips for gaining organizational buy-in. While we've a long way to go as an industry to better represent the ethical externalities of our digital society, we've had some productive conversations using Ethical Explorer, and we're encouraged by the broadening awareness of the importance of product decisions in addressing societal issues.
We're often asked to refresh, update or remediate legacy systems that we didn't originally build. Sometimes, technical issues need our attention such as improving performance or reliability. One common approach to address these issues is to create "technical stories" using the same format as a user story but with a technical outcome rather than a business one. But these technical tasks are often difficult to estimate, take longer than anticipated or don't end up having the desired outcome. An alternative, more successful method is to apply hypothesis-driven legacy renovation. Rather than working toward a standard backlog, the team takes ownership of a measurable technical outcome and collectively establishes a set of hypotheses about the problem. They then conduct iterative, time-boxed experiments to verify or disprove each hypothesis in order of priority. The resulting workflow is optimized for reducing uncertainty rather than following a plan toward a predictable outcome.
As organizations drive toward evolutionary architecture, it's important to capture decisions around design, architecture, techniques and teams' ways of workings. The process of collecting and aggregating feedback that will lead to these decisions begin with Request for Comments (RfCs). RfCs are a technique for collecting context, design and architectural ideas and collaborating with teams to ultimately come to decisions along with their context and consequences. We recommend that organizations take a lightweight approach to RFCs by using a simple standardized template across many teams as well as version control to capture RfCs.
It's important to capture these in an audit of these decisions to benefit future team members and to capture the technical and business evolution of an organization. Mature organizations have used RfCs in autonomous teams to drive better communication and collaboration especially in cross-team relevant decisions.
All major cloud providers offer a dazzling array of machine-learning (ML) solutions. These powerful tools can provide a lot of value, but come at a cost. There is the pure run cost for these services charged by the cloud provider. In addition, there is a kind of operations tax. These complex tools need to be understood and operated, and with each new tool added to the architecture this tax burden increases. In our experience, teams often choose complex tools because they underestimate the power of simpler tools such as linear regression. Many ML problems don't require a GPU or neural networks. For that reason we advocate for the simplest possible ML, using simple tools and models and a few hundred lines of Python on the compute platform you have at hand. Only reach for the complex tools when you can demonstrate the need for them.
The strangler fig pattern is often the default strategy for legacy modernization, where the new code wraps around the old and slowly absorbs the ability to handle all the needed functionality. That sort of "outside-in" approach works well for a number of legacy systems, but now that we've had enough experience with single-page applications (SPA) for them to become legacy systems themselves, we're seeing the opposite "inside-out" approach used to replace them. Instead of wrapping the legacy system, we instead embed the beginning of the new SPA into the HTML document containing the old one and let it slowly expand in functionality. The SPA frameworks don't even need to be the same as long as users can tolerate the performance hit of the increased page size (e.g., embedding a new React app inside an old AngularJS one). SPA injection allows you to iteratively remove the old SPA until the new one completely takes over. Whereas a strangler fig can be viewed as a type of parasite that uses the host tree's stable external surface to support itself until it takes root and the host itself dies, this approach is more like injecting an outside agent into the host, relying on functionality of the original SPA until it can completely take over.
A system's architecture mimics organizational structure and its communication. It's not big news that we should be intentional about how teams interact — see, for instance, the Inverse Conway Maneuver. Team interaction is one of the variables for how fast and how easily teams can deliver value to their customers. We were happy to find a way to measure these interactions; we used the Team Topologies author's assessment which gives you an understanding of how easy or difficult the teams find it to build, test and maintain their services. By measuring team cognitive load, we could better advise our clients on how to change their teams' structure and evolve their interactions.
Many of our developers coding iOS in Xcode often get headaches because the Xcodeproj file changes with every project change. The Xcodeproj file format is not human-readable, hence trying to handle merge conflicts is quite complicated and can lead to productivity loss and risk of messing up the entire project — if anything goes wrong with the file, Xcode won't work properly and developers will very likely be blocked. Instead of trying to merge and fix the file manually or version it, we recommend you use a tool-managed Xcodeproj approach: Define your Xcode project configuration in YAML (XcodeGen, Struct), Ruby (Xcake) or Swift (Tuist). These tools generate the Xcodeproj file based on a configuration file and the project structure. As a result, merge conflicts in the Xcodeproj file will be a thing of the past, and when they do happen in the configuration file, they're much easier to handle.
With TypeScript becoming a common language for front-end development and Node.js becoming the preferred BFF technology, we're seeing increasing use of UI/BFF shared types. In this technique, a single set of type definitions is used to define both the data objects returned by front-end queries and the data served to satisfy those queries by the back-end server. Ordinarily, we would be cautious about this practice because of the unnecessarily tight coupling it creates across process boundaries. However, many teams are finding that the benefits of this approach outweigh any risks of tight coupling. Since the BFF pattern works best when the same team owns both the UI code and the BFF, often storing both components in the same repository, the UI/BFF pair can be viewed as a single cohesive system. When the BFF offers strongly typed queries, the results can be tailored to the specific needs of the frontend rather than reusing a single, general-purpose entity that must serve the needs of many consumers and contain more fields than actually required. This reduces the risk of accidentally exposing data that the user shouldn't see, prevents incorrect interpretation of the returned data object and makes the query more expressive. This practice is particularly useful when implemented with io-ts to enforce the run-time type safety.
One of the most nuanced decisions facing companies at the moment is the adoption of low-code or no-code platforms, that is, platforms that solve very specific problems in very limited domains. Many vendors are pushing aggressively into this space. The problems we see with these platforms typically relate to an inability to apply good engineering practices such as versioning. Testing too is typically really hard. However, we noticed some interesting new entrants to the market — including Amazon Honeycode, which makes it easy to create simple task or event management apps, and Parabola for IFTTT-like cloud workflows — which is why we're once again including bounded low-code platforms in this volume. Nevertheless, we remain deeply skeptical about their wider applicability since these tools, like Japanese Knotweed, have a knack of escaping their bounds and tangling everything together. That's why we still strongly advise caution in their adoption.
In 2016, Christopher Allen, a key contributor to SSL/TLS, inspired us with an introduction of 10 principles underpinning a new form of digital identity and a path to get there, the path to self-sovereign identity. Self-sovereign identity, also known as decentralized identity, is a “lifetime portable identity for any person, organization, or thing that does not depend on any centralized authority and can never be taken away,” according to the Trust over IP standard. Adopting and implementing decentralized identity is gaining momentum and becoming attainable. We see its adoption in privacy-respecting customer health applications, government healthcare infrastructure and corporate legal identity. If you want to rapidly get started with decentralized identity, you can assess Sovrin Network, Hyperledger Aries and Indy OSS, as well as decentralized identifiers and verifiable credentials standards. We're watching this space closely as we help our clients with their strategic positioning in the new era of digital trust.
A deployment drift radiator makes version drift visible for deployed software across multiple environments. Organizations using automated deployments may require manual approvals for environments that get closer to production, meaning the code in these environments might well be lagging several versions behind current development. This technique makes this lag visible via a simple dashboard showing how far behind each deployed component is for each environment. This helps to highlight the opportunity cost of completed software not yet in production while drawing attention to related risks such as security fixes not yet deployed.
Fully homomorphic encryption (HE) refers to a class of encryption methods that allow computations (such as search and arithmetic) to be performed directly on encrypted data. The result of such a computation remains in encrypted form, which at a later point can be decrypted and revealed. Although the HE problem was first proposed in 1978, a solution wasn't constructed until 2009. With advances in computing power and the availability of easy-to-use open-source libraries — including SEAL, Lattigo, HElib and partially homomorphic encryption in Python — HE is becoming feasible in real-world applications. The motivating scenarios include privacy-preserving use cases, where computation can be outsourced to an untrusted party, for example, running computation on encrypted data in the cloud, or enabling a third party to aggregate homomorphically encrypted intermediate federated machine learning results. Moreover, most HE schemes are considered to be secure against quantum computers, and efforts are underway to standardize HE. Despite its current limitations, namely performance and feasibility of the types of computations, HE is worth your attention.
The Open Application Model (OAM) is an attempt to bring some standardization to the space of shaping infrastructure platforms as products. Using the abstractions of components, application configurations, scopes and traits, developers can describe their applications in a platform-agnostic way, while platform implementers define their platform in terms of workload, trait and scope. Since we last talked about the OAM, we've followed one of its first implementations with interest, KubeVela. KubeVela is close to release 1.0, and we're curious to see if implementations like this can substantiate the promise of the OAM idea.
Privacy-focused web analytics is a technique for gathering web analytics without compromising end user privacy by keeping the end users truly anonymous. One surprising consequence of General Data Protection Regulation (GDPR) compliance is the decision taken by many organizations to degrade the user experience with complex cookie consent processes, especially when the user doesn't immediately consent to the "all the cookies" default settings. Privacy-focused web analytics has the dual benefit of both observing the spirit and letter of GDPR while also avoiding the need to introduce intrusive cookie consent forms. One implementation of this approach is Plausible.
Mob programming is one of those techniques that our teams have found to be easier when done remotely. Remote mob programming is allowing teams to quickly "mob" around an issue or piece of code without the physical constraints of only being able to fit so many people around a pairing station. Teams can quickly collaborate on an issue or piece of code without having to connect to a big display, book a physical meeting room or find a whiteboard.
Secure multiparty computing (MPC) solves the problem of collaborative computing that protects privacy between parties that do not trust each other. It's aim is to safely calculate an agreed-upon problem without a trusted third party, while each participant is required to partake in the calculation result and can't be obtained by other entities. A simple illustration for MPC is the millionaires' problem: two millionaires want to understand who is the richest, but neither want to share their actual net worth with each other nor trust a third party. The implementation approaches of MPC vary; scenarios may include secret sharing, oblivious transfer, garbled circuits or homomorphic encryption. Some commercial MPC solutions that have recently appeared (e.g., Antchain Morse) claim to help solve the problems of secret sharing and secure machine learning in scenarios such as multiparty joint credit investigation and medical records data exchange. Although these platforms are attractive from a marketing perspective, we've yet to see whether they're really useful.
We suggest approaching GitOps with a degree of care, especially with regard to branching strategies. GitOps can be seen as a way of implementing infrastructure as code that involves continuously synchronizing and applying infrastructure code from Git into various environments. When used with a "branch per environment" infrastructure, changes are promoted from one environment to the next by merging code. While treating code as the single source of truth is clearly a sound approach, we're seeing branch per environment lead to environmental drift and eventually environment-specific configs as code merges become problematic or even stop entirely. This is very similar to what we've seen in the past with long-lived branches with GitFlow.
The explosion of interest around software platforms has created a lot of value for organizations, but the path to building a platform-based delivery model is fraught with potential dead ends. It's common in the excitement of new paradigms to see a resurgence of older techniques rebranded with the new vernacular, making it easy to lose sight of the reasons we moved past those techniques in the first place. For an example of this rebranding, see our blip on traditional ESBs make a comeback as API gateways in the previous Radar. Another example we're seeing is rehashing the approach of dividing teams by technology layer but calling them platforms. In the context of building an application, it used to be common to have a front-end team separate from the business logic team separate from the data team, and we see analogs to that model when organizations segregate platform capabilities among teams dedicated to a business or data layer. Thanks to Conway's Law, we know that organizing platform capability teams around business capabilities is a more effective model, giving the team end-to-end ownership of the capability, including data ownership. This helps to avoid the dependency management headaches of layered platform teams, with the front-end team waiting on the business logic team waiting on the data team to get anything done.
Password policies are a standard default for many organizations today. However, we're still seeing organizations requiring passwords to include a variety of symbols, numbers, uppercase and lowercase letters as well as inclusion of special characters. These are naive password complexity requirements that lead to a false sense of security as users will opt for more insecure passwords because the alternative is difficult to remember and type. According to NIST recommendations, the primary factor in password strength is password length, and therefore users should choose long passphrases with a maximum requirement of 64 characters (including spaces). These passphrases are more secure and memorable.
Some organizations seem to think peer review equals pull request; they've taken the view that the only way to achieve a peer review of code is via a pull request. We've seen this approach create significant team bottlenecks as well as significantly degrade the quality of feedback as overloaded reviewers begin to simply reject requests. Although the argument could be made that this is one way to demonstrate code review "regulatory compliance" one of our clients was told this was invalid since there was no evidence the code was actually read by anyone prior to acceptance. Pull requests are only one way to manage the code review workflow; we urge people to consider other approaches, especially where there is a need to coach and pass on feedback carefully.
Our positioning regarding "being agile before doing agile" and our opinions around this topic shouldn't come as a surprise; but since SAFe™ (Scaled Agile Framework®), per Gartner’s May 2019 report, is the most considered and most used enterprise agile framework, and since we're seeing more and more enterprises going through organizational changes, we thought it was time to raise awareness on this topic again. We've come across organizations struggling with SAFe's over-standardized, phase-gated processes. Those processes create friction in the organizational structure and its operating model. It can also promote silos in the organization, preventing platforms from becoming real business capabilities enablers. The top-down control generates waste in the value stream and discourages engineering talent creativity, while limiting autonomy and experimentation in the teams. Rather than measuring effort and focusing on standardized ceremonies, we recommend a leaner, value-driven approach and governance to help eliminate organizational friction such as EDGE, as well as a team cognitive load assessment to identify types of teams and determine how they should better interact with each other.
Scaled Agile Framework® and SAFe™ are trademarks of Scaled Agile, Inc.
Ideally, but especially when teams are practicing DevOps, the deployment pipeline and the code being deployed should be owned by the same team. Unfortunately, we still see organizations where there is separate code and pipeline ownership, with the deployment pipeline configuration owned by the infrastructure team; this results in delays to changes, barriers to improvements and a lack of development team ownership and involvement in deployments. One cause of this can clearly be the separate team, another can be the desire to retain “gatekeeper” processes and roles. Although there can be legitimate reasons for using this approach (e.g., regulatory control), in general we find it painful and unhelpful.
One of the ultimate goals of a platform should be to reduce ticket-based processes to an absolute minimum, as they create queues in the value stream. Sadly, we still see organizations not pushing forcefully enough toward this important goal, resulting in a ticket-driven platform operating model. This is particularly frustrating when ticket-based processes are put in front of platforms that are built on top of the self-service and API-driven features of public cloud vendors. It's hard and not necessary to achieve self-service with very few tickets right from the start, but it needs to be the destination.
Over-reliance on bureaucracy and lack of trust are among the causes of this reluctance to move away from ticket-based processes. Baking more automated checks and alerts into your platform is one way to help cut the cord from approval processes with tickets. For example, provide teams with visibility into their run costs and put in automated guardrails to avoid accidental explosion of costs. Implement security policy as code and use configuration scanners or analyzers like Recommender to help teams do the right thing.