Nowadays we talk a lot about Continuous Delivery (CD), and there is a good reason for that. In the same way that developing code driven by tests was a defining change in the past few years, the practice of releasing new versions of a system continually is becoming the next big thing.
However, though there are a lot of tools to help you implement CD, it is no simple task. In this post I'll walk you through how the team I’m on is implementing CD using automation as the first step to our goal.
Initially, the deployment process on the project was basically manual. Although we had a document with the task details, almost every time the deployment failed, it was necessary to have some experienced person identify issues and solve them. Besides that, the document changed at each iteration, to accommodate modifications to scripts that had to be run to fix issues. This made the process even more chaotic.
Another big issue was that by being super fragile, the process was very time consuming and the deployment had to happen during a low system utilization period. Which meant that the team had to update the system with the new features at night. Final straw! The team decided to invest in improving this process. And when I say "the team" I really mean across all project roles, and not only the group of developers. We collectively researched what could be improved and how to implement the fixes. Working together with the project managers and the client was critical to providing senior management with visibility of the problem. It then stopped being a team issue and became a company-wide issue.
To give a little bit of context about the project, the codebase is about 6 or 7 years old and was started with one of the first versions of the Rails framework. Today we are using Ruby 1.8 and Rails 2.3. The production environment is located in a private data center and has more than 20 boxes dedicated to run the web server, database and so on. All configuration changes are made by Puppet, which is run at every deployment. The team has a 3-week iteration and we deploy at the end of each iteration.
Solution: Take 1
The first step of improvements was to try to automate the deployment process. Depending on a manual process that is constantly changing was the biggest problem. We set off making the deployment to be as simple as running a one-line script. The automation started right from getting the Git tag for the correct version to be deployed to triggering specific scripts that had to be run at every iteration transparently. At the end of the first improvement phase, the team had a deploy script that with a simple command line would do all the system state validations, update the code and database and verify that it was in a consistent state. However, even with an automated script, the process was still unreliable. We often had to complete the deployment manually due to failures in the run.
Solution: Take 2
Then we started the second phase of improvements - we changed the entire deploy script, splitting the process into steps, where every step was atomic. Thus, in case of a failure we didn’t have to restart the whole process, just from the failed step onwards. This change helped reduce the complexity and made it a lot faster. We also investigated the common deployment issues and fixed them for good.
The deployments that usually averaged 5 to 6 hours, with a maximum of 10 hours, were down to 2 hours at the end of the two improvement phases. The project and company management were thrilled and this further boosted the team’s morale.
The next steps on our Continuous Delivery journey, will be to split code, data and infrastructure changes, so it will be possible to release new versions with no system downtime. There are a lot of techniques to help with that, and right now we are investigating the context so that we can chose the solution that will be the best fit for us. Stay tuned for more...