Modern day service orchestration using DSLs

Janvi Parikh

Published: January 21, 2020

Microservices and its challenges

Microservices is a widely adopted architecture today with tech giants like Uber, Netflix, Google, Amazon swearing by their adoption of this architecture. Now, while the world of tech is sitting up and taking notice, microservices have their own challenges.

For instance, the microservices do not live in isolation. They need to communicate with other services and as the number of microservices increase, communication and reliability on each other becomes difficult to manage. What should ideally be a representation of a business process now begins to look chaotic.

With multiple such microservices in action, visualizing the impact of a small change on the overall landscape is complicated. Even a few months down the line, when one looks at the vastly distributed code, it’s extremely likely that a struggle will ensure, to figure out the overall workflow.

This is besides the confounding confusion that the code will cause for a new developer who joins the team. Add to this the need to understand all the responsibilities and concerns across different microservices - yet another challenge to deal with.

Inter-microservice communication

When my team and I, at Thoughtworks, were presented with a similar situation, we were able to come together and efficiently orchestrate microservices by implementing a Domain Specific Language or DSL. And, that is what this blog is about.

I will be citing examples of how DSLs can be used to describe the high-level flow of any specific business process.

Orchestration and Choreography

While both are commonly used patterns in the industry, in a choreographed architecture, microservices are triggered by each other. While in the case of orchestration, we have a single microservice that orchestrates (or decides) when every other microservice should be called.

In most large scale microservice implementations we observe both of these patterns addressing different use cases. But, for the purpose of this paper, I am choosing to explore orchestration a little deeper because I, personally find that the process flow is easy to interpret and adding a new flow is a simple task.

Orchestrator

Orchestrator in action at a coffee shop

While it’s unlikely one will ever build microservices to order and serve coffee, I am using this example because it’s simple and will help illustrate the value of DSL-based orchestration as a simplify-er of microservice interactions. Plus this is also my ode to coffee, for all the times it has come to the developer community’s rescue! Interestingly, we used this coffee-related code as the process to test the orchestrator functionalities during our project.

Let’s say, a cafe wants to build an application that helps customers order and receive their coffee. Also, the cafe has ambitious plans of serving tea and beer to its customers in the near future. This calls for a speedy launch for tea and beer services that will follow the successful coffee application.

Orchestrating our solution

The ordering and brewing process involves an interaction of multiple microservices. Here is a list of those interactions:

A customer orders a cup of coffee
The orchestrator determines whether to boilMilk or boilWater and accordingly delegates the responsibility to the experts: milk-service or water-service
Our customer informs us of their desired quantity of sugar to be added to the coffee
The barista-service is called to addCoffee while the sweetener-service ensures adequate sugar is added to the drink
communication-service informs the customer that their coffee is ready to be collected
Our customer makes the payment
payment-service checks the payment status
membership-service activates the customers membership
A receipt is generated by the payment-service, once payment is successful
The customer collects their coffee
The customer drinks their coffee
The customer leaves the cafe

Orchestrator in action

While an orchestrator becomes the microservice that helps one view the business process, it does not make identifying the process any easier. And, it is at this point that we make way for Domain Specific Language or DSLs.

Enter, DSL

A DSL is a powerful way to describe interactions between various services. Additionally, configuring another process for tea or beer will be convenient via the DSL.

To buy a cup of coffee, tasks or steps need to be taken by both, the customer and the cafe. Our coffee ordering process orchestrator represented by our own DSL will look something like this:

enterCafe
 .next(orderCoffee)
 .decisionBranch(COFFEE_WITH_MILK = boilMilk, BLACK_COFFEE = boilWater)
 .next(getSugarQuantity)
 .parallel(addCoffee, addSugar)
 .next(callCustomer)
 .next(makePayment)
 .next(checkPaymentStatus)
 .next(activateCafeMembership)
 .next(generateReceipt, dependsOn = checkPaymentStatus)
 .next(collectCoffee)
 .next(drinkCoffee)
 .next(leaveCafe)

Components and Features of a DSL based Orchestrator

Orchestrators are built out of components and features. The components needed to build an orchestrator could include different types of tasks, dependencies and features.

A task is a basic building block for an orchestrator. It delegates the unit of work to either a microservice or a customer. A task simply specifies what needs to be done and the relevant microservice is responsible for how to do it.

There are different types of tasks:

1. Customer Task: One that requires customer input. For example, where a customer’s preference is needed. Such a task is begun and completed only when the customer furnishes the required information. Eg. orderCoffee or makePayment.

2. System Sync Task: One that the orchestrator could ask another microservice to perform over a synchronous API call. Ideal candidates for this kind of task are the quicker ones that the orchestrator needs to wait for, before moving on to the next task. System Sync Tasks are performed automatically by the orchestrator when all its dependencies are successfully executed. E.g. For addSugar , addCoffee.

3. System Async Task: One that publishes events when triggered. The relevant microservice consumes this event and responds back once the task is complete. The Async task then consumes back this response and is marked complete. Like the System Sync Tasks, System Async Tasks are also automatically executed when its dependencies are complete. E.g. boilMilk or checkPaymentStatus are good use-cases for an Async Task.

There are different kinds of dependencies:

Dependencies help the orchestrator determine when a task can be executed. We wouldn’t want a customer to collectCoffee before all the prerequisite tasks that it has a dependency on are performed. Dependencies can be interpreted either implicitly, based on the process definition or explicitly added.

1. Implicit Dependencies: callCustomer is dependent on addCoffee and addSugar and cannot be executed until they are complete. Such a dependency is implicitly inferred and does not require configuration because addSugar and addCoffee are System Tasks. The same is applicable for a Task after a Customer Task.

(addCoffee, addSugar)
 .next(callCustomer)

Dependencies for Async Task are based on the trigger and not completion of a task. For example, activateCafeMembership can only be started after checkPaymentStatus is triggered, but it does not have to wait for its completion.

makePayment
 .next(checkPaymentStatus)
 .next(activateCafeMembership)
 .next(generateReceipt, dependsOn = checkPaymentStatus)

2. Explicit Dependencies: However, there will be situations where we will need to wait for an Async Task’s completion. For instance, generateReceipt can begin only after checkPaymentStatus is complete. This makes checkPaymentStatus an explicit dependency for generateReceipt.

The example of ordering or serving a cup of coffee requires more than just executing tasks! From executing tasks in parallel, to taking a different path based on conditions. We also need different behaviours for when a customer revisits. The orchestrator needs features that can action such complexity.

Kinds of features:

1. Decision Branching: One can define conditions based on the next task to be determined which in turn ride on the output of previous steps. In the code snippet, below, the decision to boilMilk or boilWater is taken based on whether a milk coffee or black coffee has been ordered by the customer.

orderCoffee
 .decisionBranch(COFFEE_WITH_MILK = boilMilk, BLACK_COFFEE = boilWater)

Decision branching is an extremely useful feature, unless our business process is linear.

2. Parallel Execution of Tasks: One can execute tasks parallely when they are independent. Tasks like addCoffee and addSugar that, are independent of each other, can be executed in parallel to reduce the overall time it takes to make/receive a coffee.

getSugarQuantity
 .parallel(addCoffee, addSugar)

3. Resume: Ideal scenario is when one can ensure that customer experiences are seamless. So, if our customer orders a milk coffee with one cube of sugar and wants it to go but, subsequently realizes that they have forgotten their wallet at home - we want to remember that customer’s specific order and preference when they come back to us.

Such a customer who leaves the cafe in the middle of their order is a resume customer and needs to be treated differently than a regular customer.

Such a situation calls for some tasks to be remembered and therefore not re-initiated. Also, there could be some tasks that will have to be re-started for a resume customer. For example, we need not start orderCoffee for this customer but we want to boilMilk again to serve hot coffee.

class BoilMilk : AsyncStep() {
  
   fun execute() : Temperature {
       milkService.boil()
   }

   fun resume() : Temperature {
       this.execute()
   }  
}

At the same time for a resume customer who has already collected Coffee, it doesn’t make any sense to try to boilMilk again. Here we need to only execute tasks after collectCoffee but not those prior to it.

This is where the concept of checkpoints can be really useful. If a resume customer has already crossed a checkpoint we do not execute any steps prior to the checkpoint when they come back. collectCoffee is configured as a checkpoint to avoid execution of tasks prior to it and only execute the tasks after collectCoffee.

configureCheckpoints(collectCoffee)

4. Reset: A customer ordered a coffee with 2 cubes of sugar which is now ready to be collected. But before collectCoffee they change their mind and want a coffee with only one cube of sugar instead. addSugar is a task that is already complete.

In such situations where the customer wishes to make amends to an already completed task, we reset the task and tasks between where the customer is and the one they wish to modify.
Such resets can happen only if permitted by configuring them.

configureResetTasks(fromTask = collectCoffee, toTask = getSugarQuantity)

On reset, all the tasks between fromTask and toTask are reset. But, some tasks that anyway need to be carried out need not be affected by the reset. For example, resetting activateCafeMembership will only bother the customer with repetitive questions whose responses will not have changed with the reset itself. Which brings us to configuring Non-Resettable Tasks.

configureNonResettableTasks(activateCafeMembership)

Open-source orchestrators

We built our own orchestrator but if you are looking to use one that’s already built: Netflix’s Conductor and Uber’s Cadence are pretty cool and inspiring. Conductor let’s you define workflows through a JSON based DSL, making it simple to use. Cadence defines workflows via code, making it more extensible.

The following code samples show how we can define the process for ordering a coffee using Conductor, and then what its counterpart might look like in Cadence.

Conductor:

{
    "name": "coffee_ordering_process",
    "description": "Order coffee",
    "version": 1,
    "schemaVersion": 1,
    "tasks": [
        {
            "name": "order_coffee",
            "taskReferenceName": "order_coffee_task",
            "inputParameters": {
                "contentId": "${workflow.input.coffee_type}"
            },
            "type": "SIMPLE"
        }
    ]
}

Cadence:

public class CoffeeOrderingProcess{


 private final ActivityOptions options = new ActivityOptions.Builder().build();
 private final CoffeeTasks tasks = Workflow
.newActivityStub(CoffeeTasks.class, options);

  @WorkflowMethod
  public void orderCoffee(String customerName) {
    Saga.Options sagaOptions = new Saga.Options.Builder().build();
    Saga saga = new Saga(sagaOptions);
    String orderCoffeeID = activities.orderCoffee(customerName);
    saga.addCompensation(activities::cancelCoffee, orderCoffeeId, customerName);
  }
}

public class CoffeeTasks {
 public String orderCoffee(String customerName) {
   System.out.println("orderCoffee for " + customerName);
   return UUID.randomUUID().toString();
 }

 public String cancelCoffee(String customerName) {
   System.out.println("cancelCoffee for " + customerName);
   return UUID.randomUUID().toString();
 }
}

A few more detailed examples leveraging conductor and cadence can be found on GitHub. If your workflow is simple, the obvious choice is Conductor. For developers who prefer code over configuration, Cadence might seem more appealing.

And you can always build your own DSL and orchestrator if your use case is unique. Our application to order coffee needs features like Resume and Reset which are not inherently built into any of the open source orchestrators, which required us to build our own orchestrator.

To further explore the world of DSLs, I would highly recommend Martin Fowler and Rebecca Parsons’ book on Domain Specific Languages. As Martin put it, “You can get a good grasp of the topic by reading the narrative section (142 pages) and use the rest as a reference to dip into when you need it.”

Disclaimer: The statements and opinions expressed in this article are those of the author(s) and do not necessarily reflect the positions of Thoughtworks.

Solutions

Industries

Resource Hubs

Publications and Tools

All Insights