The life of system engineers would be a lot easier if they could squeeze all computing needs in a single computing unit. Reality is tougher. The complexity of systems is broken down into several components, some of them so resource intensive that they require their own computing instances. Multiple instances bring the communication of electronic devices to the table and there are countless things that can go wrong there. Good system design is the realization that those things should be handled gracefully, according to the rules of a discipline known as Fault Tolerance.
Consider two business applications that interact through a web API. One application performs currency conversions and the other one uses this service to show the price of investment assets in the currency of the country where the customer resides. From the customer perspective, what would happen if the connection between the two applications is interrupted? Keeping the capability of trading assets is critical for the business, thus a graceful way to handle the unavailability of the currency converter is to show to customers prices in the currency where the asset is traded. Assets in a US stock exchange would be shown in USD. In Belgium, it would be shown in EUR.
What would trigger this behavioural change, also known as graceful degradation? A mechanism should be in place to automatically adapt to predictable circumstances maintaining the service operational with limited functionalities. If an application is calling a web API, it is easy to predict that the API might be unavailable at some point in time and the application should be ready to switch to a plan B.
The Java code below is a simple implementation of the circuit breaker pattern to change the application behaviour if the API call is not successful after a few attempts. It wraps the API call, watching for failures, and once the failures reach a certain threshold, the circuit opens, no further calls are made at that point, and the flow changes to do something else. Follow the documentation in the code to understand the logic.
This is a simple implementation that targets API calls. It may need to be adapted to deal with files, databases, messaging services, and other connections that require fault tolerance. For a more generic solution, consider adopting an existing library such as Netflix Hystrixand Resilience4j.
Now, let’s see how to use it. The code below calls an endpoint:
Please, let me know your experience using this implementation and which improvements you made on top of it.