Integrating event-driven microservices with request/response APIs

[Part three]

Konrad Fögen

Published: May 02, 2024

Introduction

This is the third part of a series of posts about integrating event-driven microservices with request/response APIs. In the previous part, we discussed the benefits of decoupling event retrieval from event processing. Now, we’re going to discuss the use of circuit breakers to deal with the unavailability of request/response APIs.

Lesson learned #3: Use a circuit breaker to pause event retrieval

The tight coupling that’s introduced by request/response-based communication requires both microservices to be available because, unlike event-driven communication, there’s no middleware that can step in if the downstream microservice is temporarily unavailable.

The combination of event loops and request/response-based communication can aggravate availability problems of a downstream microservice. If a downstream microservice has problems with availability, retries initiated by the event loop put pressure on the downstream microservice. If a downstream microservice does not accept new connections, requests fail fast and the speed of event processing increases. This is a problem because the downstream microservice is put under pressure by further requests and events cannot be processed successfully. To improve situations like this, we have successfully applied circuit breakers.

Circuit breaker is a resilience pattern that originates from request/response-based communications. It helps avoid chain reactions of cascading retries when a downstream service has problems. Circuit breakers exist as ready-to-use components — such as resilience4j — that can be configured and used with HTTP clients for request/response APIs.

Figure 1: Circuit Breaker State Machine

To summarize, a circuit breaker is a component that can switch between three states: CLOSED, OPEN and HALF-OPEN, as shown in Figure 1. In the CLOSED state, a circuit breaker acts as a passthrough that passes all requests to the API. If errors exceed a threshold, the circuit breaker transitions to OPEN, in which requests are no longer passed through. Instead, the circuit breaker responds with an error, e.g. a NotPermitted exception. After a period of waiting, the circuit breaker transitions to HALF-OPEN and requests are again passed through to the API. If the requests succeed, the circuit breaker transitions to CLOSED. If they don’t succeed, the circuit breaker transitions back to OPEN.

Figure 2: Circuit Breaker Integrated into Event-driven Microservice

It’s possible to also integrate circuit breakers into event-driven microservices. Figure two depicts the integration of the circuit breaker between the event processing and the request/response API. Moreover, we found two specifics that we had to take into account. They are illustrated in Figure three and are described below.

Figure 3: Circuit Breaker Integrated into Event-driven Microservice

First, if a circuit breaker integrated into an event-driven microservice is OPEN, requests to the API fail fast because the circuit breaker responds with a NotPermitted exception. There is no pressure on the API because the circuit breaker doesn’t forward requests. However, this is inefficient because event processing still fails, events are retried and will eventually end up in a dead letter queue.

To overcome this limitation, we had a good experience with pausing the retrieval of new events when the circuit breaker transitions to OPEN. Ready-to-use circuit breakers provide event listeners that notify us about state transitions. In Figure three, this is illustrated by messages “3.1 Notify About State Transition” and “3.2 Pause Event Retrieval”, which is only sent if the circuit breaker has transitioned to OPEN.

Second, after a waiting period, the circuit breaker should transition to HALF-OPEN so that requests are passed through to the API again. With request/response-based communication, the microservice with a circuit breaker receives a request. In order to fulfill the request, the microservice itself tries to send a request to an API through the circuit breaker. If the waiting period has elapsed, the circuit breaker uses this request as a trigger to transition to HALF-OPEN and passes through the request. With event-driven communication, when the retrieval of new events was paused, such an external trigger does not exist. A scheduled action is required to trigger the transition to HALF-OPEN and to resume the retrieval of new events. Otherwise, the circuit breaker would remain OPEN. In Figure three, this is illustrated by messages “3.3 Schedule Transition”, which sets up a scheduled action if the circuit breaker has transitioned to OPEN, and “3.4 Transition to HALF-OPEN”, which happens once the wait time has elapsed. Afterwards, the event listener is notified about the state transition (message “3.1 Notify About State Transition”) and event retrieval is resumed (message “3.5 Resume Event Retrieval”) because the circuit breaker has transitioned to HALF-OPEN.

It’s possible to fine-tune circuit breakers further. The visibility timeout of events should be longer than the waiting time before the circuit breaker transitions to HALF-OPEN. If not, the same events are retrieved again and again after the transition; they end up in a dead letter queue if the API is unavailable for an extended period.

We have also found that a circuit breaker integrated into event processing is capable of handling extended periods of unavailability. This means you can adjust the maxReceiveCount, i.e. the number of times the processing of an event can fail before it’s moved to a dead letter queue, and the visibility timeout of events that failed because of failed requests. As an example, consider a maxReceiveCount of three and a visibility timeout of 30 seconds. The request/response API can be unavailable for one minute. After this minute, events are retrieved for the third time. If their processing fails again, they are moved to dead letter queues.

Conclusion

Again, like previous posts, it’s important to note that when you integrate event-driven microservices with request/response APIs, event processing depends on the availability of the APIs. In this piece we’ve looked at how you can integrate circuit breakers and how to configure them considering the specifics of event-driven microservices. In the fourth and final piece we’ll explore using rate limiting as a means of avoiding exceeding usage limits when event processing.

You can read the rest of the series:

Retries are inevitable and event processing including the request/response APIs should be idempotent (Part One).
The response times of request/response APIs impact the performance of event-driven microservices. If response times fluctuate significantly, it makes sense to decouple event retrieval from event processing (Part Two).
Circuit breakers can improve the integration with request/response APIs as they handle periods of unavailability (Part Three).
Request/response APIs may have usage limits and rate limits help adhering to them (Part Four)

Disclaimer: The statements and opinions expressed in this article are those of the author(s) and do not necessarily reflect the positions of Thoughtworks.

Solutions

Industries

Resource Hubs

Publications and Tools

All Insights

Integrating event-driven microservices with request/response APIs

Introduction

Lesson learned #3: Use a circuit breaker to pause event retrieval

Conclusion

Related content

Explore more insights