Saga Pattern in Practice — Designing Compensating Transactions for Microservice Distributed Systems (Choreography vs Orchestration)

Do you know what's the first wall you hit after migrating to microservices? For me, it was the question: "The order was created, but the payment failed — how do I roll back the inventory?" What used to be solved with a single @Transactional in the monolithic days becomes a completely different world the moment you split into three services. The DBs are separate, the network can go down, and the compensation logic starts to tangle.

The Saga pattern is one of the most widely adopted approaches in production to address this problem. Instead of locking everything like 2PC (Two-Phase Commit), each service commits only to its own DB, and on failure, a "compensating transaction" logically reverses the change. In this post, we directly compare the two implementation approaches — Choreography and Orchestration — with TypeScript code, and examine how to design compensating events in real-world scenarios.

After reading this, you should have a solid sense of "which approach to use in which situation" and "why the Outbox pattern is practically mandatory." The code in this post is TypeScript-based, and backend developers who have worked with message brokers like Kafka will find it easier to follow along.

Core Concepts

The Problem Saga Solves

When handling transactions spanning multiple services in a distributed system, there are broadly two options: enforce atomicity with distributed locks like 2PC, or achieve consistency "eventually" with compensating events like Saga.

The reasons why 2PC is hard to choose in practice are clear. If the coordinator dies, the entire system halts, and throughput drops sharply while locks are held. Especially when systems that can't participate in transactions — like Kafka or external payment APIs — are involved, 2PC itself may become impossible.

Saga approaches this problem from a different angle.

sql

[Traditional Distributed Transaction — 2PC]
BEGIN DISTRIBUTED TX
  Order Service.insert()     ← lock
  Inventory Service.update() ← lock
  Payment Service.charge()   ← lock
COMMIT (or ROLLBACK)  ← all at once
 
[Saga Pattern]
Order Service.insert()     → commit → publish event
Inventory Service.update() → commit → publish event
Payment Service.charge()   → commit  (on failure: run compensating events in reverse)

Eventual Consistency: Instead of immediate consistency, the property where all services' states eventually align over time. Saga intentionally accepts this trade-off.

To be honest, Saga gives up isolation. Between the time an order is created and the payment is processed, another Saga may read the intermediate state of an order that hasn't yet completed. In Saga literature, this is called the Lost Update or Intermediate State Visibility problem. This is different from a DB "dirty read" — the data is already committed within each service, but the overall Saga is not yet complete. It's worth confirming upfront whether exposing such intermediate states is acceptable for your business requirements.

What Is a Compensating Transaction?

A compensating transaction is not a DB rollback. It is a new, reverse operation that undoes an already-committed state.

Original Action	Compensating Transaction
Create order (status: PENDING)	Cancel order (status: CANCELLED)
Reserve inventory (reserved: 10)	Release inventory (reserved: -10)
Process payment (charged: $50)	Refund payment (refunded: $50)

The important point is that compensating transactions can also fail. That's why all compensating actions should be designed to be idempotent. If the same refund request comes in twice, it should only be processed once.

Practical Application

Example 1: Choreography — Self-Coordination via Events

Choreography is an approach where each service publishes and subscribes to events and moves autonomously, without a central coordinator. I also found this approach really elegant at first. Each service is independent, and all you need is a message broker. The ability to add a new service without touching existing ones was especially appealing.

typescript

// @EventHandler is a custom decorator representing message broker subscriptions.
// It can be implemented with NestJS's @nestjs/event-emitter or a Kafka consumer wrapper.
 
// 1. Order Service — creates order and publishes event via Outbox pattern
class OrderService {
  constructor(
    private readonly orderRepo: OrderRepository,
    private readonly db: DataSource,
  ) {}
 
  async createOrder(dto: CreateOrderDto): Promise<void> {
    // Handle DB commit and event insertion in the same transaction (Outbox pattern)
    await this.db.transaction(async (em) => {
      const order = await em.save(Order, {
        ...dto,
        status: OrderStatus.PENDING,
      });
 
      await em.save(OutboxEvent, {
        aggregateId: order.id,
        type: 'ORDER_CREATED',
        payload: JSON.stringify({ orderId: order.id, items: dto.items }),
        processedAt: null,
      });
    });
  }
 
  // Compensation on receiving failure event: cancel order
  @EventHandler('INVENTORY_RESERVATION_FAILED')
  async handleReservationFailed(event: ReservationFailedEvent): Promise<void> {
    await this.db.transaction(async (em) => {
      await em.update(Order, event.orderId, { status: OrderStatus.CANCELLED });
 
      await em.save(OutboxEvent, {
        aggregateId: event.orderId,
        type: 'ORDER_CANCELLED',
        payload: JSON.stringify({ orderId: event.orderId, reason: event.reason }),
        processedAt: null,
      });
    });
  }
}
 
// 2. Inventory Service — subscribes to OrderCreated and attempts stock reservation
class InventoryService {
  constructor(private readonly db: DataSource) {}
 
  @EventHandler('ORDER_CREATED')
  async handleOrderCreated(event: OrderCreatedEvent): Promise<void> {
    // Idempotency guarantee: skip if this event has already been processed
    const alreadyProcessed = await this.db.getRepository(InboxEvent)
      .existsBy({ eventId: event.eventId });
    if (alreadyProcessed) return;
 
    const available = await this.checkStock(event.items);
 
    if (!available) {
      // Bundle inbox record and failure event in the same transaction
      await this.db.transaction(async (em) => {
        await em.save(InboxEvent, { eventId: event.eventId });
        await em.save(OutboxEvent, {
          aggregateId: event.orderId,
          type: 'INVENTORY_RESERVATION_FAILED',
          payload: JSON.stringify({ orderId: event.orderId, reason: 'OUT_OF_STOCK' }),
          processedAt: null,
        });
      });
      return;
    }
 
    // reserveStock, inbox record, and success event must be bundled in a single transaction.
    // If the process dies after reserveStock but before the inbox record, a retry could double-deduct inventory.
    await this.db.transaction(async (em) => {
      await this.reserveStock(em, event.items);
      await em.save(InboxEvent, { eventId: event.eventId });
      await em.save(OutboxEvent, {
        aggregateId: event.orderId,
        type: 'INVENTORY_RESERVED',
        payload: JSON.stringify({ orderId: event.orderId }),
        processedAt: null,
      });
    });
  }
}

Here is the Choreography event flow illustrated as a diagram.

css

[Order Service]
    │ publishes ORDER_CREATED
    ▼
[Inventory Service] ──── out of stock ───► publishes INVENTORY_RESERVATION_FAILED
    │ success                                              │
    │ publishes INVENTORY_RESERVED          [Order Service] ◄──────────────┘
    ▼                                           publishes ORDER_CANCELLED
[Payment Service]
    │ publishes PAYMENT_PROCESSED
    ▼
[Order Service] → confirm order

However, as the number of services grew past 5 or 6, it became increasingly difficult to track "where this event is published and where it's consumed." When debugging, you have to open the Kafka console and trace events one by one, and when an event is lost somewhere in the middle or the processing order gets scrambled, the frustration is considerable. You end up having to open the code of every service just to understand the full flow.

Example 2: Orchestration — Centralized Control with an Orchestrator

With Orchestration, a central Saga orchestrator explicitly calls each step and manages state. This approach shines when compensation order matters or there are many steps. Since the entire flow is in one place, "which step failed and what compensation needs to run" is clearly visible from the code alone.

typescript

// Saga state type definitions
type SagaStep = 'RESERVE_INVENTORY' | 'PROCESS_PAYMENT' | 'CONFIRM_ORDER';
type SagaStatus = 'RUNNING' | 'COMPLETED' | 'COMPENSATING' | 'FAILED';
 
interface SagaState {
  id: string;
  orderId: string;
  currentStep: SagaStep;
  completedSteps: SagaStep[]; // cumulatively stored in DB; referenced in reverse during compensation
  status: SagaStatus;
}
 
class OrderSagaOrchestrator {
  constructor(
    private readonly sagaRepo: SagaRepository,
    private readonly inventoryClient: InventoryClient,
    private readonly paymentClient: PaymentClient,
    private readonly orderClient: OrderClient,
  ) {}
 
  async execute(orderId: string): Promise<void> {
    const saga = await this.sagaRepo.create({
      orderId,
      currentStep: 'RESERVE_INVENTORY',
      completedSteps: [],
      status: 'RUNNING',
    });
 
    try {
      // Step 1: Reserve inventory
      await this.inventoryClient.reserve(orderId);
      // recordStep adds the given step to the completedSteps array and persists it to DB.
      // Even after an orchestrator restart, we can tell which steps completed for compensation.
      await this.sagaRepo.recordStep(saga.id, 'RESERVE_INVENTORY');
 
      // Step 2: Process payment
      await this.paymentClient.process(orderId);
      await this.sagaRepo.recordStep(saga.id, 'PROCESS_PAYMENT');
 
      // Step 3: Confirm order
      await this.orderClient.confirm(orderId);
      await this.sagaRepo.markCompleted(saga.id);
 
    } catch (error) {
      await this.sagaRepo.updateStatus(saga.id, 'COMPENSATING');
      // Re-fetch the latest completedSteps via findById to use for compensation
      await this.compensate(await this.sagaRepo.findById(saga.id));
    }
  }
 
  private async compensate(saga: SagaState): Promise<void> {
    // Run compensation in reverse order only for completed steps
    const compensations: Partial<Record<SagaStep, () => Promise<void>>> = {
      PROCESS_PAYMENT: () => this.paymentClient.refund(saga.orderId),
      RESERVE_INVENTORY: () => this.inventoryClient.release(saga.orderId),
      // CONFIRM_ORDER has no compensation — if we reached this step, everything succeeded
    };
 
    const stepsToCompensate = [...saga.completedSteps].reverse();
 
    for (const step of stepsToCompensate) {
      const compensation = compensations[step];
      if (compensation) {
        try {
          await compensation();
        } catch (err) {
          // On compensation failure: move to Dead Letter Queue or notify for manual intervention
          await this.notifyManualIntervention(saga.id, step, err);
        }
      }
    }
 
    await this.sagaRepo.markFailed(saga.id);
  }
}

The Orchestration flow is much clearer.

css

[OrderSagaOrchestrator]
    │
    ├──► inventoryClient.reserve(orderId)  ✓ → completedSteps: ['RESERVE_INVENTORY']
    │
    ├──► paymentClient.process(orderId)    ✗ failed!
    │
    │    [Begin compensation — reverse order]
    ├──► inventoryClient.release(orderId)  (compensates RESERVE_INVENTORY)
    │
    └──► sagaRepo.markFailed(sagaId)

Advanced: Leveraging Platforms Instead of Rolling Your Own

Example 3: Durable Execution with Temporal

When implementing your own Saga engine, you keep running into questions like "what happens if the orchestrator restarts?" and "which step do we resume from after a network failure?" As of 2025, more teams are delegating this complexity to a platform, with Temporal being the leading choice.

typescript

import { proxyActivities, ApplicationFailure } from '@temporalio/workflow';
import type * as activities from './activities';
 
const { reserveInventory, processPayment, cancelReservation, refundPayment } =
  proxyActivities<typeof activities>({
    startToCloseTimeout: '30s',
    retry: {
      maximumAttempts: 3,
      nonRetryableErrorTypes: ['InsufficientStockError', 'InvalidPaymentError'],
    },
  });
 
// Temporal durably manages this workflow's state via event sourcing
export async function orderSagaWorkflow(orderId: string): Promise<void> {
  let inventoryReserved = false;
  let paymentProcessed = false;
 
  try {
    await reserveInventory(orderId);
    inventoryReserved = true;
 
    await processPayment(orderId);
    paymentProcessed = true;
 
  } catch (err) {
    // Compensation — in reverse order, continuing even if individual compensations fail
    if (paymentProcessed) {
      await refundPayment(orderId).catch(() => {
        // Handle refund failure separately, e.g., send a notification
      });
    }
    if (inventoryReserved) {
      await cancelReservation(orderId);
    }
 
    throw ApplicationFailure.create({ message: `Order saga failed: ${orderId}` });
  }
}

Durable Execution: An execution model that persists the state of workflow code (which steps have completed) using event sourcing, allowing resumption from the point of interruption even after a server restart or network failure. Temporal is the canonical implementation; AWS Step Functions offers similar guarantees.

If you're confident implementing your own Saga engine, building it yourself is a perfectly valid choice. But for a first adoption, starting with a prototype on Temporal Cloud's free tier or AWS Step Functions can significantly reduce implementation cost.

Trade-off Analysis

Choreography vs Orchestration Comparison

Criterion	Choreography	Orchestration
Service coupling	Low — no direct inter-service dependencies	High — orchestrator must know each service
Overall flow visibility	Low — logic scattered across multiple files	High — full state visible in one file
Debugging difficulty	Grows with number of services	Relatively straightforward
Complex compensation ordering	Difficult to manage	Explicitly controllable
Throughput & scalability	High — well-suited for async processing	Orchestrator can become a bottleneck
Independent deployment	Each team can deploy independently	Requires deployment when orchestrator changes

Decision Criteria by Situation

Honestly, there's no answer of "you must always use this one." Use the criteria below to choose what fits your situation.

Situation	Recommended Approach
3 or fewer services, simple success/failure flow	Choreography
Compensation order matters or 5+ steps	Orchestration
Service teams need to deploy independently	Choreography
Business-critical workflow requiring audit logs	Orchestration
Frequently adding new services to existing flow	Choreography
Complex rollback logic requiring manual intervention on failure	Orchestration

Drawbacks and Caveats

Item	Description	Mitigation
Intermediate State Visibility	Other Sagas may read intermediate state data between steps	Review business requirements and decide whether to allow. Consider the Semantic Lock pattern if needed (marking in-progress records with something like `status: PROCESSING` so other Sagas can filter them out)
Compensation failure	Compensating transactions themselves can fail	Dead Letter Queue + manual intervention notification + idempotent compensations
Orchestrator single point of failure	In Orchestration, an orchestrator failure halts the Saga	Persist orchestrator state to DB; design for recovery on restart
Event explosion	In Choreography, tracking event relationships becomes harder as service count grows	Document an Event Catalog; track with Correlation IDs
Non-compensatable actions	Emails, SMS messages cannot be undone	Place them as the last step in the Saga, or use a "scheduled send" approach to defer delivery

Outbox Pattern: A pattern for atomically handling DB commits and message publishing. Events are saved to an Outbox table within the same transaction as the business data, and a separate process reads and publishes them to the message broker. It prevents situations where a DB commit succeeds but the event publication is missed at any Saga step, making it practically mandatory.

The Most Common Mistakes in Production

Not applying idempotency to compensating transactions — Network timeouts can cause the same compensation request to be delivered twice. Including an idempotency_key in requests and using an Inbox table to prevent duplicate processing is an effective approach.
Placing email/SMS delivery in the middle of a Saga — If you send an "order confirmation email" immediately after inventory reservation and the payment fails causing the order to be cancelled, there's no way to recall an already-sent email. External notifications should always be placed as the last step of the Saga.
Not persisting Saga state — If the orchestrator restarts or the network goes down and you don't know which steps have completed, you can't properly execute compensation. It's critical to record state like completedSteps to the DB and design for resumption on restart.

Closing Thoughts

The biggest realization I had while adopting the Saga pattern is that it doesn't solve the problem of "perfect atomicity." Rather, it's about designing how gracefully you handle failures in exchange for giving up perfect atomicity. More important than whether you choose Choreography or Orchestration is ensuring that your team clearly understands and accepts the trade-offs that come with that choice — coupling, visibility, and compensation complexity.

Three steps you can start with right now:

Draw the distributed transaction flow for your current services. Write out each step like Order → Inventory → Payment, and fill in a "compensation column" next to each step for how you'd reverse the previous step if that step failed. The gaps in your design will start to become visible.
Implement a simple 2–3 step flow first with Choreography. Build the ORDER_CREATED → INVENTORY_RESERVED → PAYMENT_PROCESSED flow with Kafka or RabbitMQ, and apply the Outbox pattern at each step as you go.
Consider switching to Orchestration when the flow reaches 4+ steps or compensation ordering becomes important. Before implementing your own orchestrator, try prototyping on Temporal Cloud's free tier or AWS Step Functions first — it can dramatically reduce implementation cost.

References

Saga Pattern in Practice — Designing Compensating Transactions for Microservice Distributed Systems (Choreography vs Orchestration) | DEV BAK - 기술블로그

DevOps

Saga Pattern in Practice — Designing Compensating Transactions for Microservice Distributed Systems (Choreography vs Orchestration)

Core Concepts

The Problem Saga Solves

Saga approaches this problem from a different angle.

sql

[Traditional Distributed Transaction — 2PC]
BEGIN DISTRIBUTED TX
  Order Service.insert()     ← lock
  Inventory Service.update() ← lock
  Payment Service.charge()   ← lock
COMMIT (or ROLLBACK)  ← all at once
 
[Saga Pattern]
Order Service.insert()     → commit → publish event
Inventory Service.update() → commit → publish event
Payment Service.charge()   → commit  (on failure: run compensating events in reverse)

Eventual Consistency: Instead of immediate consistency, the property where all services' states eventually align over time. Saga intentionally accepts this trade-off.

What Is a Compensating Transaction?

A compensating transaction is not a DB rollback. It is a new, reverse operation that undoes an already-committed state.

Original Action	Compensating Transaction
Create order (status: PENDING)	Cancel order (status: CANCELLED)
Reserve inventory (reserved: 10)	Release inventory (reserved: -10)
Process payment (charged: $50)	Refund payment (refunded: $50)

Practical Application

Example 1: Choreography — Self-Coordination via Events

typescript

// @EventHandler is a custom decorator representing message broker subscriptions.
// It can be implemented with NestJS's @nestjs/event-emitter or a Kafka consumer wrapper.
 
// 1. Order Service — creates order and publishes event via Outbox pattern
class OrderService {
  constructor(
    private readonly orderRepo: OrderRepository,
    private readonly db: DataSource,
  ) {}
 
  async createOrder(dto: CreateOrderDto): Promise<void> {
    // Handle DB commit and event insertion in the same transaction (Outbox pattern)
    await this.db.transaction(async (em) => {
      const order = await em.save(Order, {
        ...dto,
        status: OrderStatus.PENDING,
      });
 
      await em.save(OutboxEvent, {
        aggregateId: order.id,
        type: 'ORDER_CREATED',
        payload: JSON.stringify({ orderId: order.id, items: dto.items }),
        processedAt: null,
      });
    });
  }
 
  // Compensation on receiving failure event: cancel order
  @EventHandler('INVENTORY_RESERVATION_FAILED')
  async handleReservationFailed(event: ReservationFailedEvent): Promise<void> {
    await this.db.transaction(async (em) => {
      await em.update(Order, event.orderId, { status: OrderStatus.CANCELLED });
 
      await em.save(OutboxEvent, {
        aggregateId: event.orderId,
        type: 'ORDER_CANCELLED',
        payload: JSON.stringify({ orderId: event.orderId, reason: event.reason }),
        processedAt: null,
      });
    });
  }
}
 
// 2. Inventory Service — subscribes to OrderCreated and attempts stock reservation
class InventoryService {
  constructor(private readonly db: DataSource) {}
 
  @EventHandler('ORDER_CREATED')
  async handleOrderCreated(event: OrderCreatedEvent): Promise<void> {
    // Idempotency guarantee: skip if this event has already been processed
    const alreadyProcessed = await this.db.getRepository(InboxEvent)
      .existsBy({ eventId: event.eventId });
    if (alreadyProcessed) return;
 
    const available = await this.checkStock(event.items);
 
    if (!available) {
      // Bundle inbox record and failure event in the same transaction
      await this.db.transaction(async (em) => {
        await em.save(InboxEvent, { eventId: event.eventId });
        await em.save(OutboxEvent, {
          aggregateId: event.orderId,
          type: 'INVENTORY_RESERVATION_FAILED',
          payload: JSON.stringify({ orderId: event.orderId, reason: 'OUT_OF_STOCK' }),
          processedAt: null,
        });
      });
      return;
    }
 
    // reserveStock, inbox record, and success event must be bundled in a single transaction.
    // If the process dies after reserveStock but before the inbox record, a retry could double-deduct inventory.
    await this.db.transaction(async (em) => {
      await this.reserveStock(em, event.items);
      await em.save(InboxEvent, { eventId: event.eventId });
      await em.save(OutboxEvent, {
        aggregateId: event.orderId,
        type: 'INVENTORY_RESERVED',
        payload: JSON.stringify({ orderId: event.orderId }),
        processedAt: null,
      });
    });
  }
}

Here is the Choreography event flow illustrated as a diagram.

css

[Order Service]
    │ publishes ORDER_CREATED
    ▼
[Inventory Service] ──── out of stock ───► publishes INVENTORY_RESERVATION_FAILED
    │ success                                              │
    │ publishes INVENTORY_RESERVED          [Order Service] ◄──────────────┘
    ▼                                           publishes ORDER_CANCELLED
[Payment Service]
    │ publishes PAYMENT_PROCESSED
    ▼
[Order Service] → confirm order

Example 2: Orchestration — Centralized Control with an Orchestrator

typescript

// Saga state type definitions
type SagaStep = 'RESERVE_INVENTORY' | 'PROCESS_PAYMENT' | 'CONFIRM_ORDER';
type SagaStatus = 'RUNNING' | 'COMPLETED' | 'COMPENSATING' | 'FAILED';
 
interface SagaState {
  id: string;
  orderId: string;
  currentStep: SagaStep;
  completedSteps: SagaStep[]; // cumulatively stored in DB; referenced in reverse during compensation
  status: SagaStatus;
}
 
class OrderSagaOrchestrator {
  constructor(
    private readonly sagaRepo: SagaRepository,
    private readonly inventoryClient: InventoryClient,
    private readonly paymentClient: PaymentClient,
    private readonly orderClient: OrderClient,
  ) {}
 
  async execute(orderId: string): Promise<void> {
    const saga = await this.sagaRepo.create({
      orderId,
      currentStep: 'RESERVE_INVENTORY',
      completedSteps: [],
      status: 'RUNNING',
    });
 
    try {
      // Step 1: Reserve inventory
      await this.inventoryClient.reserve(orderId);
      // recordStep adds the given step to the completedSteps array and persists it to DB.
      // Even after an orchestrator restart, we can tell which steps completed for compensation.
      await this.sagaRepo.recordStep(saga.id, 'RESERVE_INVENTORY');
 
      // Step 2: Process payment
      await this.paymentClient.process(orderId);
      await this.sagaRepo.recordStep(saga.id, 'PROCESS_PAYMENT');
 
      // Step 3: Confirm order
      await this.orderClient.confirm(orderId);
      await this.sagaRepo.markCompleted(saga.id);
 
    } catch (error) {
      await this.sagaRepo.updateStatus(saga.id, 'COMPENSATING');
      // Re-fetch the latest completedSteps via findById to use for compensation
      await this.compensate(await this.sagaRepo.findById(saga.id));
    }
  }
 
  private async compensate(saga: SagaState): Promise<void> {
    // Run compensation in reverse order only for completed steps
    const compensations: Partial<Record<SagaStep, () => Promise<void>>> = {
      PROCESS_PAYMENT: () => this.paymentClient.refund(saga.orderId),
      RESERVE_INVENTORY: () => this.inventoryClient.release(saga.orderId),
      // CONFIRM_ORDER has no compensation — if we reached this step, everything succeeded
    };
 
    const stepsToCompensate = [...saga.completedSteps].reverse();
 
    for (const step of stepsToCompensate) {
      const compensation = compensations[step];
      if (compensation) {
        try {
          await compensation();
        } catch (err) {
          // On compensation failure: move to Dead Letter Queue or notify for manual intervention
          await this.notifyManualIntervention(saga.id, step, err);
        }
      }
    }
 
    await this.sagaRepo.markFailed(saga.id);
  }
}

The Orchestration flow is much clearer.

css

[OrderSagaOrchestrator]
    │
    ├──► inventoryClient.reserve(orderId)  ✓ → completedSteps: ['RESERVE_INVENTORY']
    │
    ├──► paymentClient.process(orderId)    ✗ failed!
    │
    │    [Begin compensation — reverse order]
    ├──► inventoryClient.release(orderId)  (compensates RESERVE_INVENTORY)
    │
    └──► sagaRepo.markFailed(sagaId)

Advanced: Leveraging Platforms Instead of Rolling Your Own

Example 3: Durable Execution with Temporal

typescript

import { proxyActivities, ApplicationFailure } from '@temporalio/workflow';
import type * as activities from './activities';
 
const { reserveInventory, processPayment, cancelReservation, refundPayment } =
  proxyActivities<typeof activities>({
    startToCloseTimeout: '30s',
    retry: {
      maximumAttempts: 3,
      nonRetryableErrorTypes: ['InsufficientStockError', 'InvalidPaymentError'],
    },
  });
 
// Temporal durably manages this workflow's state via event sourcing
export async function orderSagaWorkflow(orderId: string): Promise<void> {
  let inventoryReserved = false;
  let paymentProcessed = false;
 
  try {
    await reserveInventory(orderId);
    inventoryReserved = true;
 
    await processPayment(orderId);
    paymentProcessed = true;
 
  } catch (err) {
    // Compensation — in reverse order, continuing even if individual compensations fail
    if (paymentProcessed) {
      await refundPayment(orderId).catch(() => {
        // Handle refund failure separately, e.g., send a notification
      });
    }
    if (inventoryReserved) {
      await cancelReservation(orderId);
    }
 
    throw ApplicationFailure.create({ message: `Order saga failed: ${orderId}` });
  }
}

Durable Execution: An execution model that persists the state of workflow code (which steps have completed) using event sourcing, allowing resumption from the point of interruption even after a server restart or network failure. Temporal is the canonical implementation; AWS Step Functions offers similar guarantees.

Trade-off Analysis

Choreography vs Orchestration Comparison

Criterion	Choreography	Orchestration
Service coupling	Low — no direct inter-service dependencies	High — orchestrator must know each service
Overall flow visibility	Low — logic scattered across multiple files	High — full state visible in one file
Debugging difficulty	Grows with number of services	Relatively straightforward
Complex compensation ordering	Difficult to manage	Explicitly controllable
Throughput & scalability	High — well-suited for async processing	Orchestrator can become a bottleneck
Independent deployment	Each team can deploy independently	Requires deployment when orchestrator changes

Decision Criteria by Situation

Honestly, there's no answer of "you must always use this one." Use the criteria below to choose what fits your situation.

Situation	Recommended Approach
3 or fewer services, simple success/failure flow	Choreography
Compensation order matters or 5+ steps	Orchestration
Service teams need to deploy independently	Choreography
Business-critical workflow requiring audit logs	Orchestration
Frequently adding new services to existing flow	Choreography
Complex rollback logic requiring manual intervention on failure	Orchestration

Drawbacks and Caveats

Item	Description	Mitigation
Intermediate State Visibility	Other Sagas may read intermediate state data between steps	Review business requirements and decide whether to allow. Consider the Semantic Lock pattern if needed (marking in-progress records with something like `status: PROCESSING` so other Sagas can filter them out)
Compensation failure	Compensating transactions themselves can fail	Dead Letter Queue + manual intervention notification + idempotent compensations
Orchestrator single point of failure	In Orchestration, an orchestrator failure halts the Saga	Persist orchestrator state to DB; design for recovery on restart
Event explosion	In Choreography, tracking event relationships becomes harder as service count grows	Document an Event Catalog; track with Correlation IDs
Non-compensatable actions	Emails, SMS messages cannot be undone	Place them as the last step in the Saga, or use a "scheduled send" approach to defer delivery

Outbox Pattern: A pattern for atomically handling DB commits and message publishing. Events are saved to an Outbox table within the same transaction as the business data, and a separate process reads and publishes them to the message broker. It prevents situations where a DB commit succeeds but the event publication is missed at any Saga step, making it practically mandatory.

The Most Common Mistakes in Production

Not applying idempotency to compensating transactions — Network timeouts can cause the same compensation request to be delivered twice. Including an idempotency_key in requests and using an Inbox table to prevent duplicate processing is an effective approach.
Placing email/SMS delivery in the middle of a Saga — If you send an "order confirmation email" immediately after inventory reservation and the payment fails causing the order to be cancelled, there's no way to recall an already-sent email. External notifications should always be placed as the last step of the Saga.
Not persisting Saga state — If the orchestrator restarts or the network goes down and you don't know which steps have completed, you can't properly execute compensation. It's critical to record state like completedSteps to the DB and design for resumption on restart.

Closing Thoughts

Three steps you can start with right now:

Draw the distributed transaction flow for your current services. Write out each step like Order → Inventory → Payment, and fill in a "compensation column" next to each step for how you'd reverse the previous step if that step failed. The gaps in your design will start to become visible.
Implement a simple 2–3 step flow first with Choreography. Build the ORDER_CREATED → INVENTORY_RESERVED → PAYMENT_PROCESSED flow with Kafka or RabbitMQ, and apply the Outbox pattern at each step as you go.
Consider switching to Orchestration when the flow reaches 4+ steps or compensation ordering becomes important. Before implementing your own orchestrator, try prototyping on Temporal Cloud's free tier or AWS Step Functions first — it can dramatically reduce implementation cost.

Core Concepts

The Problem Saga Solves

What Is a Compensating Transaction?

Practical Application

Example 1: Choreography — Self-Coordination via Events

Example 2: Orchestration — Centralized Control with an Orchestrator

Advanced: Leveraging Platforms Instead of Rolling Your Own

Example 3: Durable Execution with Temporal

Trade-off Analysis

Choreography vs Orchestration Comparison

Decision Criteria by Situation

Drawbacks and Caveats

The Most Common Mistakes in Production

Closing Thoughts

References

Core Concepts

The Problem Saga Solves

What Is a Compensating Transaction?

Practical Application

Example 1: Choreography — Self-Coordination via Events

Example 2: Orchestration — Centralized Control with an Orchestrator

Advanced: Leveraging Platforms Instead of Rolling Your Own

Example 3: Durable Execution with Temporal

Trade-off Analysis

Choreography vs Orchestration Comparison

Decision Criteria by Situation

Drawbacks and Caveats

The Most Common Mistakes in Production

Closing Thoughts

References

Recommended Posts

GitOps Multi-Stage Promotion with Kargo — Automating dev to prod with Argo CD Integration

Automating Canary Rollbacks with Kargo + Argo Rollouts: AnalysisTemplate and Freight Propagation Blocking in Practice

Apache Kafka as an AI Agent Event Broker — Scaling MCP·A2A Multi-Agent Systems

Implementing Secrets Manager Multi-Tenant Isolation from a Single IAM Role with EKS Pod Identity + ABAC

Automatically Syncing EKS Multi-Cluster Secrets Without Vault — AWS Secrets Manager + IRSA + ESO in Practice

Argo CD Multi-Cluster Secret Management: Sealed Secrets and External Secrets Operator in Practice