Distributed Transactions in the Microservice

Link The Saga Pattern is as microservices architectural pattern to implement a transaction that spans multiple services.

A saga is a sequence of local transactions. Each service in a saga performs its own transaction and publishes an event. The other services listen to that event and perform the next local transaction. If one transaction fails for some reason, the saga also executes compensating transactions to undo the impact of the preceding transactions.

Let’s see a simple example in a typical food delivery app flow.

When a user places an order, below could be the sequence of actions that happen.

The food ordering service creates an order. At this point, the order is in a PENDING state. A saga manages the chain of events.
The saga contacts the restaurant via the restaurant service.
The restaurant service attempts to place the order with the chosen restaurant. After getting a confirmation, it sends back a reply.
The saga receives the reply. And depending on the reply, it can approve the order or reject the order.
The food order service then changes the state of the order. If the order was approved, it would inform the customer with the next details. If rejected, it will also inform the customer with an apology message.

Types of Sagas

There are two types of Sagas:

Orchestration-Based Saga

In this approach, there is a Saga orchestrator that manages all the transactions and directs the participant services to execute local transactions based on events. This orchestrator can also be though of as a Saga Manager.

Choreography-Based Saga

In this approach, there is no central orchestrator. Each service participating in the Saga performs their transaction and publish events. The other services act upon those events and perform their transactions. Also, they may or not publish other events based on the situation.

Advantages and Disadvantages of the Saga Pattern

The main benefit of the Saga Pattern is that it helps maintain data consistency across multiple services without tight coupling. This is an extremely important aspect for a microservices architecture.

However, the main disadvantage of the Saga Pattern is the apparent complexity from a programming point of view. Also, developers are not as well accustomed to writing Sagas as traditional transactions. The other challenge is that compensating transactions also have to be designed to make Sagas work.

In my opinion, Sagas can help solve certain challenges and scenarios. They should be adopted or explored if the need arises. However, I would love to hear if others have also used Saga Pattern and how was the experience? What frameworks (if any) did you use?

Link A Saga represents a high-level business process (such as booking a trip) that consists of several low-level Requests that each update data within a single service. Each Request has a Compensating Request that is executed when the Request fails or the saga is aborted.

Distributed Saga Guarantee

Amazingly, a distributed saga guarantees one of the following two outcomes:

Either all Requests in the Saga are succesfully completed, or
A subset of Requests and their Compensating Requests are executed.

The catch is for distributed sagas to work, both Requests and Compensating Requests need to obey certain characteristics:

Requests and Compensating Requests must be idempotent, because the same message may be delivered more than once. However many times the same idempotent request is sent, the resulting outcome must be the same. An example of an idempotent operation is an UPDATE operation. An example of an operation that is NOT idempotent is a CREATE operation that generates a new id every time.
Compensating Requests must be commutative, because messages can arrive in order. In the context of a distributed saga, it’s possible that a Compensating Request arrives before its corresponding Request. If a BookHotel completes after CancelHotel, we should still arrive at a cancelled hotel booking (not re-create the booking!)
Requests can abort, which triggers a Compensating Request. Compensating Requests CANNOT abort, they have to execute to completion no matter what.

Distributed Saga Implementation Approaches

There are a couple of different ways to implement a Saga transaction, but the two most popular are:

Event-driven choreography: When there is no central coordination, each service produces and listen to other service’s events and decides if an action should be taken or not.
Command/Orchestration: When a coordinator service is responsible for centralizing the saga’s decision making and sequencing business logic.

In this guide, we’ll look at the latter. With the orchestration approach, we define a new Saga Execution Coordinator service whose sole responsibility is to manage a workflow and invoke downstream services when it needs to.

Saga Execution Coordinator

The Saga Execution Coordinator is an orchestration service that:

Stores & interprets a Saga’s state machine
Executes the Requests of a Saga by talking to other services
Handles failure recovery by executing Compensating Requests

Link Saga Pattern Tips

Create a Unique ID per Transaction

Having a unique identifier for each transaction is a common technique for traceability, but it also helps participants to have a standard way to request data from each other. By using a transaction Id, for instance, Delivery Service could ask Stock Service where to pick up the products and double check with the Payment Service if the order was paid.

Add the Reply Address Within the Command

Instead of designing your participants to reply to a fixed address, consider sending the reply address within the message, this way you enable your participants to reply to multiple orchestrators.

Idempotent Operations

If you are using queues for communication between services (like SQS, Kafka, RabbitMQ, etc.), I personally recommended you make your operations Idempotent. Most of those queues might deliver the same message twice.

It also might increase the fault tolerance of your service. Quite often a bug in a client might trigger/replay unwanted messages and mess up with your database.

Avoiding Synchronous Communications

As the transaction goes, don’t forget to add into the message all the data needed for each operation to be executed. The whole goal is to avoid synchronous calls between the services just to request more data. It will enable your services to execute their local transactions even when other services are offline.

The downside is that your orchestrator will be slightly more complex as you will need to manipulate the requests/responses of each step, so be aware of the tradeoffs.

Let’s see how it looks like using our previous e-commerce example:

Order Service saves a pending order and asks Order Saga Orchestrator (OSO) to start a create order transaction.
OSO sends an Execute Payment command to Payment Service, and it replies with a Payment Executed message
OSO sends a Prepare Order command to Stock Service, and it replies with an Order Prepared message
OSO sends a Deliver Order command to Delivery Service, and it replies with an Order Delivered message

In the case above, Order Saga Orchestrator knows what is the flow needed to execute a “create order” transaction. If anything fails, it is also responsible for coordinating the rollback by sending commands to each participant to undo the previous operation.

A standard way to model a saga orchestrator is a State Machine where each transformation corresponds to a command or message. State machines are an excellent pattern to structure a well-defined behavior as they are easy to implement and particularly great for testing.

Rolling Back in Saga’s Command/Orchestration

Rollbacks are a lot easier when you have an orchestrator to coordinate everything:

Link Example: Orchestration-based saga

An e-commerce application that uses this approach would create an order using an orchestration-based saga that consists of the following steps:

The Order Service creates an Order in a pending state and creates a CreateOrderSaga
The CreateOrderSaga sends a ReserveCredit command to the Customer Service
The Customer Service attempts to reserve credit for that Order and sends back a reply
The CreateOrderSaga receives the reply and sends either an ApproveOrder or RejectOrder command to the Order Service
The Order Service changes the state of the order to either approved or cancelled

Distributed Transactions in the Microservice | Saga