Wednesday, February 6, 2013

Breaking the Circuit Breaker

The circuit breaker is this wonderful pattern to protect your application against resources that fail slowly. The idea is that you stop trying to use a resource when it has too many failures. Regular retries test the resource and will make the resource available again. The benefit is that your application can react quickly to a failed resource instead of hogging CPU, threads, network, etc. while you are waiting to find out the resource is unavailable.

So what's wrong?

Its the metaphor. In the classical description a circuit breaker has 3 states: the open state, the closed state and the half-open state. So what does it mean when the circuit breaker is open? When is a bridge open? When you can drive over it, or when you can sail through it? Only when you look at the first image you may see that a traditional open circuit breaker stops flow of electricity. To us that translates to no usage of the resource. In the 'closed' state electricity flows, which translates to having access to our resource. Now read that again and see if you can remember that!

Then we have a half-open state? Again, look at the first image. For such a switch half-open is still open. (A half-open bridge lets no traffic trough at all but that is another topic.) Why do we need the half-open state anyway? In the classical description we attempt to use the resource once while in this state. If it fails just once, we go back to the open state. This seems like a good idea, but let us think of modern networked applications. In such applications many requests are done simultaneously. So as soon as we switch to the half-open state for a retry, many, maybe hundreds of request will immediately try to use the resource, even if it is still down. This is exactly what we were trying to prevent!

Stop!

Although the circuit breaker is a great invention, I think we need a new metaphor, or at least some new terminology.

No more half-open

The first thing we can do is get rid of the half-open state. Instead, when its time to retry, we just let 1 client through to the resource. While that check is in progress we keep denying access to the resource for other clients; we stay in the same state. Only when the single check succeeds, we switch to the state in which we allow full access to the resource.

No more open

The second thing we need to do is to end the confusion on what it means to be 'open'. Instead I propose we call this state the broken state. No further explanation required. Good. In this state we do the regular retries.

Finally, to make things symmetric, I propose to rename the 'closed' state to flow state as all requests are granted.

Metaphor

Above I proposed new terminology but I failed to provide a new metaphor. Unfortunately metaphors are hard to find and too easy to get wrong. Perhaps a good metaphor should be related to the fact that we are limiting the number of errors we tolerate from a resource. If you have an idea, please let me know in a comment. I hope you liked my little rant. Any comments are always welcome.

—   ❧   —

Postscript: Sentries and the circuit breaker

The Sentries library contains a highly optimized circuit breaker implementation for Scala programs. The ideas in this article developed while writing Sentries. Feel free to have a look. As you can see there are only 2 states, the FlowState and the BrokenState. Note that the retryAt in BrokenState is a val; it can not be changed after initialization. When it is time to retry we replace the broken state with a new instance (in method attemptResetBrokenState).