The client’s Order Routing team wanted to improve their algorithm by leveraging demand forecast and further optimize for gross margin per online order. The principle here is that demand forecast pinpoints the most ‘disadvantaged’ location – a location where the product is likely to go on markdown (a 50% off sale) in the near future. Therefore, margin on the online order can be improved by fulfilling the product from such a disadvantaged location.
Let’s dig into this a little with an example. It’s fall season and the retailer has received an online order for a pair of shorts. There are two stores that stock the product. One, based out of Chicago and other in San Diego. Our demand forecast tells us that the approaching winter in Chicago will ensure less demand for the product, which means the shorts will go on markdown. As opposed to this, in sunny San Diego the product will sell full price. The order should be fulfilled from Chicago, the disadvantaged location, to yield a better margin.
A case for the Test-and-Learn PlatformThe Order Routing algorithm is a ‘sensitive’ algorithm and small changes to it can have a big impact on the margin of fulfilled orders. One example could be; accounting for higher shipping costs of one location over another. This is why prototyping is a preferred path when evolving the algorithm.
The project team’s new demand forecast approach went through the prototype stage as well but, with certain limitations. The prototype could not be built for the entire data set due to sheer volume which meant inaccurate results. There was also no real time feedback from the changes planned for the production software, as the prototype was disconnected from the production systems. We needed to find a better solution!
Genesis of the PlatformOur client partnered with ThoughtWorks not just for our software development expertise but also to inculcate the culture and capabilities of Lean Thinking (test-and-learn). And, given the algorithm’s sensitivity, a real time feedback loop for software changes was essential.
While a solution was identified based on prototype results, it did not guarantee a lot of confidence. In response, the team built a test-and-learn platform that pitted multiple solutions against one another, and used business metrics to determine the best solution.
While test-and-learn is a commonly used technique to test variations of user interfaces, this situation was unique because we were testing business logic. Moreover, it had to be carried out on an enterprise backend, low latency system within a complex business domain and a high risk environment.
Getting this right was paramount and called for a well thought out architecture that would support the business need.
Test-and-Learn Platform ArchitectureLet’s get into a little more details of the kind of architecture we were working with. A test-and-learn platform conventionally has 3 components:
- An experimentation framework to run multiple experiments in parallel
- An analytics framework to validate the experiments
- A business impact dashboard to monitor business health metrics in real time
Test-and-learn platform architectureThe algorithm in production is called the ‘active’ algorithm. The new solutions are implemented as experiment algorithms in ‘passive’ mode – running in isolation without affecting the active algorithm. This safety feature was instrumental in getting us the green light to build the platform. The active algorithm’s input parameters are passed on to the experiment algorithms, along with the necessary transient data. This helps with an ‘apples-to-apples’ comparison.
The analytics framework compares the experiment algorithms’ output with the active algorithm’s output using complex business calculations (gross margin lift). The winning experiment algorithm is promoted to production, making it active – merging it with the active algorithm or running it alongside the active algorithm.
The aforementioned business impact dashboard monitors the newly promoted experiment algorithms’ impact, on the health metrics. These metrics are the Order Routing’s Key Performance Indicators (KPIs).
The Challenges and their SolutionsAlthough the test-and-learn platform has been a great success, the project team did overcome their fair share of challenges to meet this success.
Running the experiments in passive mode was the perfect way to avoid negative fallout from the experiments on the Order Routing algorithm. Business users try new improvements without worrying about consequences. Additionally, business could turn off a newly promoted experiment after reading the business impact dashboard, and immediately have orders routed through the earlier, safe version of the algorithm.
Testing high-risk experiments
Quick feedback is the lifeblood of experimentation, and the team built a low-fi implementation for the dependency; demand forecast. The team imported a small subset of real data into the application database and manually refreshed it at regular intervals. This low-fi implementation meant the team could launch the experiment in just a few weeks instead of a few months, and receive valuable feedback from production. The low-fi and hi-fi implementation shared a common interface which resulted in minimal change (and a lot less throw away work) when migrating to the hi-fi implementation.
Receiving fast feedback in an enterprise setting
Low-fi implementation for faster feedback
The real data used in the Low-Fi version was periodically refreshed from the external dependency. This was a temporary hassle but it helped uncover a lot of issues early on – not only in the algorithm but also in the data from the dependency. For example, of different demand forecast variants, we requested the appropriate variant based on initial learnings. This helped us to influence the development of the external dependency as well.
Using real data for testing
Conventionally, web based A/B test results can be analyzed relatively quickly. However, in our case, the algorithm’s effectiveness, measured in gross margin lift, could only be carried out after 6-12 weeks. In response to this longer loop, the project team came up with the "feedback confidence levels" concept. A quick but low confidence level was used to make small tweaks to the algorithm, while the highest confidence level (that took longer) was used to determine eligibility for promotion to production.
Receiving fast feedback in a complex domain
The business impact dashboard’s KPIs needed to have a direct cause and effect relationship with the code being changed, while also staying aligned to the organization’s KPIs. For example, shipping costs is an organizational KPI. But this metric could be affected by multiple organizational initiatives that sometimes even run contrary to each other.
Identifying direct cause and effect relationship KPIs
To avoid system ‘noise’, the project team picked a metric that was only affected by our system while giving us a directional read on the organizational KPI. One such metric was ‘percentage of shipments from stores versus warehouses’. The code in our system directly impacted that percentage. As a general rule of thumb, shipments from warehouses are cheaper than stores and hence if the percentage of shipments skewed largely in favor of stores after a certain experiment went live, it would increase shipping costs, thus negatively affecting the organization’s ‘shipping cost’ KPI.
The metrics were identified by the same team developing the experiment which meant they would be prioritized as per the project’s needs. It was also encouraging to witness the team’s emotional investment in the success of the experiment since the metric was real time validation not only of the experiment but also the hard work put in.
Additional benefits of the PlatformOne of the unintended but pleasant consequences of having a business impact dashboard was relooking at existing business inefficiencies and identifying new opportunities. For example, store shipments would skew from normal to becoming excessively high during certain business hours.
The test-and-learn platform created highly specialized algorithms for certain types of orders, like expedited orders, which have incidentally been a huge upgrade for the product.
ConclusionIt has been a game changing experience to solve business problems with creative solutioning and validating the right solution through rapid experimentation. In our particular scenario, during the test phase, we averaged a +2% in gross margin per single SKU transaction.
This effectively meant that with the new test-and-learn platform, there had been a drastic reduction in cost of validating business hypotheses. And, the project team had gone on to enhance the Order Routing product, and consequently the architecture, to support experimentation and analytics as first-class concepts.