Enable javascript in your browser for better experience. Need to know to enable it? Go here.
radar blip
radar blip

Focus on mean time to recovery

Last updated : May 05, 2015
NOT ON THE CURRENT EDITION
This blip is not on the current edition of the Radar. If it was on one of the last few editions, it is likely that it is still relevant. If the blip is older, it might no longer be relevant and our assessment might be different today. Unfortunately, we simply don't have the bandwidth to continuously review blips from previous editions of the Radar. Understand more
May 2015
Adopt ? We feel strongly that the industry should be adopting these items. We use them when appropriate on our projects.

Traditionally operations groups look to improve the mean time between failures. While avoiding failures is obviously still important, lessons from cloud computing have taught us to expect failure and instead to focus on mean time to recovery. Continuous Delivery automation makes rolling out rapid fixes easier and we are also seeing a growth in monitoring techniques to spot failures quickly through a ‘production immune system’. Teams are also successfully using semantic monitoring and synthetic transactions to exercise production systems in non-destructive ways. This combined focus allows teams to move rapidly with higher confidence, it can also reduce the emphasis on expensive test-execution in pre-production environments and is particularly important in responding to the ever-growing list of security vulnerabilities that are being discovered.

Jan 2015
Adopt ? We feel strongly that the industry should be adopting these items. We use them when appropriate on our projects.
Jul 2014
Trial ? Worth pursuing. It is important to understand how to build up this capability. Enterprises should try this technology on a project that can handle the risk.
In DevOps-savvy organizations delivery teams often configure production monitoring and respond to incidents themselves. This visibility and access into production environments allows those teams to make changes to their systems to improve their ability to recover quickly when something goes wrong. This focus on mean time to recovery improves quality of service overall, and allows teams to safely deploy more frequently. This can also reduce the emphasis on expensive test execution in non-production environments. Techniques we've used include end-to-end 'semantic monitoring' or reconciliation of real business transactions, and the injection of 'synthetic transactions' which exercise systems in non-destructive ways in production.
Jan 2014
Assess ? Worth exploring with the goal of understanding how it will affect your enterprise.
In previous radars we recommended arranging automated acceptance tests into longer journeys and, in what we call semantic monitoring, running these tests continuously against a production environment. We still believe that this is an important technique for scenarios the team can anticipate in advance. A variation of this approach, seen especially with startups, is to reduce the number of tests while increasing monitoring and automatic alarms. This shifts the focus from avoiding problems that can be anticipated to reducing mean time to recovery for all problems.
May 2013
Assess ? Worth exploring with the goal of understanding how it will affect your enterprise.
Published : May 22, 2013

Download Technology Radar Volume 29

English | Español | Português | 中文

Stay informed about technology

 

Subscribe now

Visit our archive to read previous volumes