One of the most important decisions a FinOps team will make is choosing the right metrics to track. The right metrics will provide the team with the insights they need to make informed decisions about cloud costs. However, with so many potential metrics to choose from, it can be difficult to know where to start.
This blog post will discuss the features that a good metric has and the questions that we need to ask while selecting the right metrics.
Features of a Good metric
A good cost optimization metric has the following features.
It is easy to understand: It should not contain any complex math and connect to the team that uses it.
It changes over time: The whole point of including a metric in a report is that it is volatile and changes over time. If the metric is static, it does not make sense to include in it the report.
It is actionable: There should be defined actions on the changing values of metrics. If the metric value changes and we do need to take any actions, then the metric should not be included.
It is not obvious: Obvious things should not be included in the reports.
Questions to ask before selecting a metric
What exactly are we trying to measure? This is crucial to define the precise financial aspect or performance indicator that the metric aims to capture, ensuring clarity in tracking financial goals.
Does it give us enough information to act on it? How can we improve the values of the metric to acceptable thresholds? This question ensures that the selected metric provides actionable insights, enabling informed decisions and interventions for financial optimization.
How can we find the value of this metric in the Cloud? This is essential to determine the feasibility of collecting and calculating the metric within a Cloud environment, ensuring data availability.
Does the value of this metric change over time? It assesses the metric's temporal dynamics, It does not make sense to track a metric whose value remains the same.
Why do we need to add this metric and what are we trying to achieve? This question clarifies the rationale for incorporating the metric, linking it to specific financial objectives and desired outcomes.
Example one:
Spend rate:
The spend rate example is the most basic metric and almost all organizations will include it in their production costing report. Let’s answer questions which were listed below and then decide whether we need to add this metric.
What exactly are we trying to measure?
We are trying to measure the daily/monthly/quarterly spend in the cloud.
Why do we need to add this metric and what are we trying to achieve?
It allows to monitor the trend of spend and help detect sudden increase and decrease in amounts
How can we find the value of this metric in the Cloud?
In AWS you can find it in AWS Cost explorer and In Azure you can find it in the Cost analysis tool.
Does the value of this metric change over time?
Yes
Does it give us enough information to act on it?
We define a threshold according to our daily/monthly/quarterly budget and if the amount is more than the budget we start investigation on the source of the cost increase and was it intentional / planned / unplanned or due to external factors from business?
With all the compelling answers It makes sense to include the metric in the report.
Example two:
Top three services by cost expenditure:
Let’s look at another example which makes sense at first but when we try to answer our questions this metric will not make it to the report.
What exactly are we trying to measure?
We are trying list the top three services by the amount we spend using them
Why do we need to add this metric and what are we trying to achieve?
While it is good to know which services are incurring the most cost, not very sure on how knowing this will help in any case.
How can we find the value of this metric in the Cloud?
In AWS you can find it in AWS Cost explorer and In Azure you can find it in the Cost analysis tool. When you group it via services
Does the value of this metric change over time?
No, Our systems have matured and are no longer in the build phase so the top three services pretty much stays the same.
Does it give us enough information to act on it?
After we know the top three services there is no clarity on what action we need to take on them. Also, it is difficult to define a threshold on this metric.
Unfortunately, the example two metric fails to have compelling answers to our questions and it does not make sense to include it in the report. Please bear in mind that we have taken an example where the services are stable and the answers to these questions may vary from person to person and from organization to organization.
Example three:
Cost per order
Let's take a look at a more complex example where the metrics does not only help in keeping infrastructure cost in check but also help in determining the overall health of the business
What exactly are we trying to measure?
We are trying to measure how much cloud cost do we incur for every order we received
Why do we need to add this metric and what are we trying to achieve?
This metric is a good way to determine the profit/loss or investment made in fulfilling an order helping us to keep costs in check.
How can we find the value of this metric in the Cloud?
We can get the total daily/monthly bill of cloud and total number of orders received in a day/month. The ratio would be our metric.
Does the value of this metric change over time?
Yes, with more features and cloud infrastructure added to serve those feature and also change in product pricing and sales will result in a change in value
Does it give us enough information to act on it?
Yes, with further drill down we can determine which features are cost effective and which features are not, helping not just in the infrastructure cost management but also in business decisions
This metric has compelling answers for all our questions and it definitely makes sense to include this in the report. This metric makes more sense for companies/startups in E-commerce and Retail.
The answers for the same metric might have different answers depending on the type of industry and organization asking the questions. But there are a few metrics which help in keeping costs at checks and it almost always makes sense to include them in reports irrespective of the industry or organization:
Reservation coverage: Reservation coverage measures the percentage of your cloud infrastructure costs that are covered by reservations. It helps you determine how effectively you are leveraging reserve instances to reduce your overall cloud expenses.
Idle resource costs: This metric tracks the percentage of cloud costs attributed to resources that are not actively utilized. Aim to keep idle resource costs below five to 10% of your total cloud spend.
Cost as a percentage of revenue: This metric helps evaluate whether cloud costs are in line with business revenue and profitability goals.
Resource utilization efficiency: This metric assesses how effectively cloud resources are utilized, helping identify underutilized or overprovisioned assets.
Monthly/Weekly variance in cloud costs: Monitoring monthly/weekly cost variances helps ensure that cloud spending remains within budgeted limits.
Choosing the right metrics for a FinOps report is an important decision. The right metrics will provide the team with the insights they need to make informed decisions about cloud costs. By considering the key factors outlined in this blog post, teams can choose the metrics that are most relevant to their specific needs.
Disclaimer: The statements and opinions expressed in this article are those of the author(s) and do not necessarily reflect the positions of Thoughtworks.