menu

Choosing the Right Tool to Provision AWS Infrastructure

This article might help you choose the right provisioning tool if you are looking to migrate or build complex infrastructure on AWS.

We were faced with a task of migrating over 300 servers from the client’s data center to AWS. It required building 7 similar environments, with subtle differences. We were a team composed of DevOps consultants, and one seasoned operations engineer. We wanted to migrate as soon as possible, to free up resources in the client’s data center and bring down costs. Thankfully, Configuration management using Chef was in place, which made the task less daunting.

What we were looking for?

We required a tool which met the following criteria:

  1. Ability to provision AWS VPC resources
  2. Modularity: To begin with, we wanted to automate provisioning of one environment and reuse it to build other environments.

We narrowed our search down to: CloudFormation and Terraform.

Both are template driven tools and help developers maintain Infrastructure as Code. The idea is to treat infrastructure code same as the software code.

CloudFormation is provided by AWS. It can provision almost every service/resource provided by AWS. On the other hand, Terraform is an open source tool and is drawing attention within the DevOps community. It is a tool created by HashiCorp – creators of widely used tools like Vagrant, packer, consul.

CloudFormation and Terraform both can make incremental changes  to infrastructure.

Comparison between CloudFormation and Terraform

1. Describing resources

Terraform allows a developer to describe resources using a friendly DSL written in hashicorp HCL, which is internally parsed by GO. It enables us to write modules which can be reused to build duplicate environments. Terraform’s DSL leaves only one thing to be desired: it is not possible to use Go’s language features to write succinct code.

On the other hand, CloudFormation templates are JSON documents, which are verbose and can be complex for developers to understand. We are forced to duplicate the template for each environment. As infrastructure grows, this bunch of JSON documents can get difficult to manage. cfn-dsl, provides a ruby DSL around CloudFormation, but it is very basic.

Terraform snippet

resource "aws_instance" "nat" {
    ami = "${var.aws_nat_ami}"
    availability_zone = "us-east-1b"
    instance_type = "m1.small"
    key_name = "${var.aws_key_name}"
    security_groups = ["${aws_security_group.nat.id}"]
    subnet_id = "${aws_subnet.us-east-1b-public.id}"
    associate_public_ip_address = true
    source_dest_check = false
}
 
resource "aws_subnet" "us-east-1b-public" {
    vpc_id = "${aws_vpc.nat-vpc.id}"
    cidr_block = "10.0.0.0/24"
    availability_zone = "us-east-1b"
}

CloudFormation snippet

"usEast1bPublic": {
    "Type": "AWS::EC2::Subnet",
    "Properties": {
        "VpcId": {
            "Ref": "VPC"
        },
        "CidrBlock": "10.0.0.0/24",
        "AvailabilityZone": "us-east-1b"
    }
},

"Nat": {
    "Type": "AWS::EC2::Instance",
    "Properties" : {
      "AvailabilityZone": "us-east-1b",
      "DisableApiTermination": "FALSE",
      "ImageId": "ami-2e1bc047",
      "InstanceType": "m1.small",
      "SubnetId": {
            "Ref": "usEast1bPublic"
        },
      "KeyName": "gwt_gdc_to_aws_migration",
      "Monitoring": "false"
    }
}

2. Visibility of changes to be applied

Most provisioning tools, including CloudFormation, execute changes directly to infrastructure. Infrastructure developers have to work-out the effects of a change, which becomes intractable in a large infrastructure.

Terraform helps here by doing infrastructure changes in two phases: planning and execution. In the planning phase, Terraform draws an action plan. The plan includes all actions to be taken: what resources will be created, destroyed or modified. At this point, the developer can choose to review and then apply changes. Next, in the execution phase, it applies changes to infrastructure. Separate planning and execution phases provides visibility and control on infrastructure changes.

3. Handling failure

When provisioning fails, both tools react differently. CloudFormation provides an option to rollback the entire execution, or fix deviations manually. In rare cases, the stack can go into a frozen state and can not be recovered, without help from Amazon support center.

On the other hand, Terraform marks the resource as “tainted” and in the next execution, it will remove the tainted resources and will attempt to re-provision them.

I prefer Terraform’s way of handling failure, because it does not re-build successfully provisioned resources and only focusses on tainted resources.

4. Support for AWS Resources

CloudFormation can provision almost all services provided by AWS like VPC, Auto Scaling groups, Amazon CloudWatch, AWS Elastic Beanstalk applications, AWS OpsWorks etc.

Terraform supports most of the building blocks required for setting up AWS VPC. But it lacked few building blocks, which slowed us down. For example, when we started using it, it did not support CRUD operations for network acts, virtual private gateway. The cross zone load-balancing and disk encryption was not supported. We managed to work around this, by contributing to Terraform. We also used cloudFormation for setting up virtual private gateway, since it was one time activity.

This may be a crucial deciding factor for you, while considering project timelines.

5. Community involvement

CloudFormation is maintained by AWS and is closed for changes proposed by public.

On the other hand, I could contribute to Terraform for provisioning AWS resources like network ACL, toggle cross-zone load balancing and instance tenancy selection. Though Terraform is at a nascent stage, its maintainers are open to Pull Requests and they actively provide feedback and merge those.

6. State management

At the core of Terraform lies its state management. On every infrastructure update, it records the state of resources locally or remotely in a JSON file. It does not retrieve the current state of infrastructure from cloud provider, instead it refers to the local state.

If multiple developers are working on the same environment, they are forced to share the same state file. If there is no state file, Terraform will duplicate resources for you. We worked-around this problem by keeping a centralised single copy of state file.

Terraform does not reverse sync the state of real infrastructure. If developers make manual changes to infrastructure, they will go unnoticed. We worked around this problem, by strictly prohibiting manual changes to environments.

CloudFormation on other hand does not store any state. Multiple developers can work on an environment without any conflicts.

7. Cloud agnostic

Cloud Formation is an AWS product. If you want to spread your infrastructure across different cloud vendors, CloudFormation is restrictive.

On another engagement, we partnered with a client to build a template driven open-source tool for the vcloud platform. We could not reuse it for other cloud providers so knew that it can be limiting. Terraform has addressed this by not having a vendor lock-in. Terraform configuration can be used across cloud providers.

8. Stability

Terraform’s latest release is 0.3.6. It is not a mature and stable tool yet, however it is gradually moving in that direction. CloudFormation on other hand, is stable and is tried and tested.

Our choice

We selected Terraform over CloudFormation, primarily because it enabled us to write reusable scripts and it gave us better visibility on infrastructure updates. Though it is early to predict its future, we like the vision of Terraform. Of course, they have a long way to go to become a stable product. It is definitely a tool which will change the way we manage infrastructure.