Rethinking building on the cloud: Part 3: Principles to maximize environments

As mentioned in my last blog, the newly launched Mingle SaaS runs entirely on the AWS cloud, for which we designed the architecture from scratch. Rethinking our approach to environments in the development and deployment process, has allowed us to shed a lot of problems experienced in traditional data center environments. Following the principles described here has increased the reliability of our system, streamlined our deployment process and sped up our development. There are three principles we follow to ensure that we maximize these benefits.

Ad hoc environments

                                                                                                                                                                                                    Ensure that environments can be created and destroyed quickly and on demand.                       

We ensure that it is trivial to deploy a complete, new instance of the system, rather than using a set of pre-existing environments and otherwise running partial or crippled environments (for example simplified systems deployed on developers' boxes).

This means that everywhere we test, we are using realistic environments, and so we are less likely to encounter problems further down the pipeline. It means that we can create short-lived environments for special purposes. It means that there is no contention for the use of a small set of realistic environments. And it means that we can dispose of environments when we are done with them, rather than have them sitting around consuming resources and accumulating configuration drift.

To achieve this we automate everything required for the creation of a new environment. A script takes the name of the environment as a parameter and creates everything that the environment needs, from scratch. Another script destroys the environment when we're done with it.

In practice we create many completely new development environments several times a day. We also have two long-running environments: one for staging changes before release and one that runs our production system.

Shared-nothing environments

                                                                                                                                                                                                    Do not share any infrastructure between environments.                                                    

Every environment has its own database servers, package repositories, monitoring systems and so on. There are three reasons for this:

  1. Firstly, shared infrastructure creates a touch point that allows environments to interfere with one another; for example heavy load on one environment could affect the performance of another one.
  2. Secondly it causes migration problems: we would like every infrastructure change we make to be tested as it progresses up a delivery pipeline; if two environments in the pipeline share a piece of infrastructure then an upgrade will affect both of them at once.
  3. And thirdly, the lack of sharing simplifies the versioning of packages deployed to the system; with no shared package repository, it is clear what version is deployed because there is only one version present in any environment.

This approach does have some costs. We have to pay for the infrastructure for every environment (an RDS instance for each, for example). And creating everything whenever we spin up an environment can take time (again, RDS instances are notable here).

Cookie-cutter environments

                                                                                                                                                                                                    Minimize the difference between environments.                                                     

Every environment we create is essentially identical. In particular, network topologies, instance types and clustering approaches are the same in our development environments as they are in production.

Configuration required for individual environments is limited to the environment name, the root domain for the environment’s DNS entries, an email address for alerts and some security details.

Because there are no significant differences between environments, our deployment process is relatively simple and we see very few bugs caused by differences between the environments.

We can do this in the main because we don’t share any infrastructure between environments, so there is no need to pass in configuration identifying infrastructural services.

Another technique that minimizes differences is auto-scaling. All environments start equal at deploy-time and then adapt to the load placed on them, rather than having heavily-loaded environments like production provisioned with more resources than, say, a development environment.

Principles in practice

Following these principles has an impact on many aspects of the system's design. It has required us to be very disciplined when making changes and forced us to reject some approaches which were otherwise attractive. However the effort required adhering to the practices has been worth it.

Keeping all environments completely isolated means that problems with one environment cannot contaminate another; and that all changes to all components in our system are tested by going through the pipeline rather than being imposed on multiple environments at once.

Ensuring that we can spin up such environments quickly, and as needed, means that we never waste time waiting for environments to be available or reconfiguring them to our immediate needs.

And using full environments, identical to production, at every stage in our development pipeline means that we don’t experience errors caused by unanticipated differences between environments; and that we fully integrate all the components in our system early on in the pipeline, giving us fast feedback on any inconsistencies.

Mingle is now on the cloud. Start your project now!