Bug Alchemy: Mastering the art of software remediation

Testing
Blog

Andrews Roberta Mary R

Published: October 17, 2023

Everyone wants to build bug-free software from the get-go, but in practice, that’s nearly impossible. The complexity of software systems, combined with the diverse hardware and software environments in which they operate, makes it challenging to eliminate all potential defects. So, when it eventually appears, the only solution is to address it promptly and efficiently. In this article, I present a bug alchemy to help you master the art of software remediation.

Bug triage

When a bug surfaces, it’s important to assess its priority to better manage fixes. But how do you effectively evaluate “priority”? While organizations and projects have their own bug triage frameworks, there’s a commonly used approach which can be very effective.

Assessing bug severity

Bug severity refers to the impact or seriousness of the bug on the software's functionality or user experience. Common categories of bug severity are:

Critical/High: These are bugs that cause a complete failure of critical functions and result in data loss or pose severe security risks.

Example:

Bug description	Impact
The login page of a web application is vulnerable to SQL injection.	Allows unauthorized access, compromising sensitive user information.

Expand table

Collapse table

Major/Medium: These are bugs that affect important features and functionality but don’t cause a complete system failure.

Example:

Bug description	Impact
In an e-commerce website, pagination on item review pages fails to load.	Intermittent issues affect UX and frustrates customers, potentially leading to a drop in sales.

Expand table

Collapse table

Minor/Low: These are bugs that have a minimal impact on functionality or can be easily worked around by users.

Example:

Bug description	Impact
A mobile app displays a minor formatting issue in the footer on some devices.	Cosmetic and does not hinder critical functionality, UX or business operations.

Expand table

Collapse table

Assessing bug frequency

Once you’ve assigned severity, you then need to ascertain how frequently the failing case is occurring. A bug in Facebook might occur only occasionally, but when it does occur, it will impact millions of users. This is why you need to consider your user base when assessing frequency.

Assessing bug priority based on severity and frequency

Bug priority refers to the order in which bugs should be addressed and fixed. Common categories of bug priority are:

P0/Red: Bugs that need to be fixed immediately as they severely impact users or business operations.
P1/Amber: Important bugs that do not require immediate attention. They may be fixed in the next regular development cycle.
P2/Yellow: Bugs that have minimal impact or can be deferred for future releases without affecting the core functionality.

Below is a custom framework to prioritize bugs based on severity and frequency.

Severity / Frequency	High	Medium	Low
Always	1	1	2
Sometimes	1	2	3
Rarely	2	3	3

Expand table

Collapse table

Levels: 1 - high priority, 2 - medium priority, 3 - low priority

Based on the needs of your organization, project and capabilities, you might incorporate further levels of severity (Critical/Major/Minor/Cosmetic) and/or frequency (Always/Frequently/Sometimes/Rarely) and proceed accordingly.

Bug court

By the time all bugs are triaged, the P0s/Red bugs will have been picked and fixed. P1s need a different approach, however. One method is something called a “bug court” where development, testing and product management stakeholders connect to advocate for the various bugs in the backlog.

QAs should advocate so as to provide insights into the technical aspects of a bug and its potential implications on the overall system. BAs and POs, meanwhile, can evaluate its impact on business and UX. By conducting bug courts on a regular cadence, teams can prioritize and schedule bugs to be resolved in upcoming releases.

Bug war

A bug war is set up when a substantial backlog of bugs accumulates, and the business prioritizes bug fixes before introducing new features. In a bug war, developers go into a hackathon-style mode, selecting bugs based on priority and resolving them through Desk Checks. As developers become available, they pick up bugs in the priority list.

Gamifying bug resolution with bug-fix-bounties and leaderboards can foster healthy competition and motivate developers. Adding weightings for bug complexity will add fairness to the competition, because some bugs might inevitably be much more challenging to fix than others. This is illustrated in the table below:

JIRA ID	Bug Summary	Component / Microservice to fix	Bug Priority	Weightage / Story points	Developer
PROJECT1-123	Performance degradation in orders page	Component A	1	2	Robin
PROJECT1-124	Performance degradation in orders page	Component B	1	1	Daisy
PROJECT1-125	Performance degradation in orders page	Component A	1	2	Violet
PROJECT1-126	Field validation not happening for certain inputs	Component C	2	3	Daisy

Expand table

Collapse table

In this above example, Daisy becomes the Bug War winner and receives the well-deserved bug-fix-bounty for her exceptional contributions.

Bugs in production

Get it sorted out

Despite your best efforts, critical production issues might sneak in from time to time. At such times, stay calm and don’t panic. Then, prioritize. If it needs an immediate fix, assign a pair of skilled developers to work on it. In a DevOps 'you build it, you run it' setting, this may involve the same team responsible for development also taking responsibility for operational aspects. Once the fix is implemented, the QA in the team can conduct a thorough and rigorous examination. And remember to test not only the fix but also its impact on the surrounding codebase to avoid introducing new issues. After deploying the fix, perform a sanity check in the production environment to ensure everything functions as expected.

RCA and the path forward

Mistakes are meant for learning, not repeating, they say. Root Cause Analysis (RCA) enables precisely that. RCA is the process of investigating an issue to identify its underlying causes and understand why it wasn’t discovered in earlier phases of development and testing. The key steps in conducting a root cause analysis and remediating them are:

Identify the root cause: Analyze the data. Look for patterns, dependencies and potential factors contributing to the problem.
Assess testing gaps: Review the testing processes at each phase (analysis, kickoff, implementation, DevBox, testing and regression) to understand why the issue wasn’t caught earlier.
Process improvement: Based on the above findings, identify areas for process improvement, such as enhancing testing strategies, introducing additional test cases or improving collaboration.
Automation: As part of the solution, consider introducing automated tests to cover the specific case that led to the production issue.
Documentation: Document the RCA findings, recommended actions and preventive measures to share with the team and stakeholders.

I’ve outlined best practices and processes here, but it’s important to remember that promptly addressing defects depends on a well-coordinated effort between developers, testers and product managers. This is particularly critical in a DevOps setting where teams have end-to-end responsibility. Effective collaboration and communication is essential to ensure defects are addressed and that software in production operates smoothly. Building a culture of transparency — fundamental to the success of the bug courts — and collaboration — production issues often need multiple developers working together in real time and RCAs often find that problems begin with communication between team members — can’t be done just by following a step-by-step process. So, use the bug alchemy framework to build a remediation practice, but remember to communicate, adapt and collaborate as a team.

Disclaimer: The statements and opinions expressed in this article are those of the author(s) and do not necessarily reflect the positions of Thoughtworks.

Solutions

Industries

Publications and Tools

All Insights

Bug Alchemy: Mastering the art of software remediation

Bug triage

Assessing bug severity

Assessing bug frequency

Assessing bug priority based on severity and frequency

Bug court

Bug war

Bugs in production

Get it sorted out

RCA and the path forward

Related blogs

Keep up to date with our latest insights