Saga guarantees either all sub-transactions are committed or compensated for, but the saga system itself may crash. There are a few states that a saga may be in on crash:
- Saga received a request, but no transaction started yet. States of services involved in sub-transactions are not modified by saga, nothing has been done.
- Some sub-transactions were done. When restarted, saga has to resume from the last completed transaction.
- A sub-transaction was started, but not completed yet. Since we are not sure if the remote service completed the transaction, failed the transaction, or the request to remote service timed out, saga has to redo the sub-transaction on restart. That also means sub-transactions have to be idempotent.
- A sub-transaction failed and its compensating transaction has not started yet. Saga has to resume from the compensating transaction on restart.
- A compensating transaction started but not completed yet. The solution is basically the same as the previous one. That means compensating transactions have to idempotent too.
- All sub-transactions or compensating transactions were done, which is the same as the first case.
In order for saga to recover to the states mentioned above, we have to keep track of each step of sub-transactions and compensating transactions. We decided to achieve that by saving the following events in a persistent store called saga log:
- Saga started event stores the entire saga request, which includes multiple transaction/compensation requests
- Transaction started event stores individual transaction request
- Transaction ended event stores individual transaction request and its response
- Transaction aborted event stores individual transaction request and the cause of failure
- Transaction compensated event stores individual compensation request and its response
- Saga ended event marks the end of a saga request and stores nothing
By persisting these events in saga log, a crashed saga can be recovered to any states above.
Since saga only needs persistence of events and the event contents are stored as JSON, the implementation of the saga log is very flexible. Databases (SQL or NoSQL), durable message queues, or even plain files can be used as event storage, but some are faster for saga to recover.
Request Data Structure of Saga
In our case, flight booking, car rental, and hotel reservation have no dependency among each other at all and they can be processed in parallel. However, it's more user friendly for our customers to only charge their credit cards once, when all the bookings are done successfully. That means the transactions of the four services look like the graph below.
It makes sense to implement the data structure of trip planning request as a Directed Acyclic Graph. The root of the graph is saga start task and the leaf is saga end task.
As mentioned above, flight booking, car rental, and hotel reservation can be processed in parallel. But doing so creates another problem: what if flight booking failed while car rental is being processed? We can't keep waiting for the response from car rental service, because we have no idea how long it will take.
THe best thing we can do is to send the car rental request again and hope we get a response so that we can continue our backward recovery. If car rental service never responds, we may have to fallback with manual intervention.
The delayed booking request may still be received by the remote car rental service. When it does, the service has already processed the same booking and its cancellation request.
Therefore, services have to implement transactions and compensations in such a way that transaction request received after its compensation request takes no effect. Caitie McCaffrey called this commutative compensating request in her talk on Distributed Sagas: A Protocol for Coordinating Microservices.
ACID and Saga
ACID is consistency model with the following properties:
Saga does not provide ACID guarantee, because atomicity and isolation are not satisfied according to the paper
full atomicity is not provided. That is, sagas may view the partial results of other sagas 
With durable saga log, saga guarantees consistency and durability.
Finally, the architecture of our saga looks like this
- Saga execution component parses the request JSON and builds a graph of requests
- TaskRunner ensures the execution order of requests with task queue
- TaskConsumer processes tasks to write events to saga log and send requests to remote services
In this article, we talked about how saga can be implemented with a saga log to persist transaction/compensation events and request graph. A crashed saga can also be recovered from all the persisted events in saga log on restart. However, there are a few requirements on design of microservices to ensure saga consistency guarantee:
- transaction and compensation requests must be idempotent
- compensation requests must be commutative
- Original Paper on Sagas by By Hector Garcia-Molina & Kenneth Salem
Share onTwitter Facebook Google+ LinkedIn
Leave a Comment
Your email address will not be published. Required fields are marked *