Yige

Yige

Build

Distributed Transactions

Distributed Transactions#

Content from

  1. Geek Time Column: 《Principles and Algorithm Analysis of Distributed Technology》
  2. If someone asks you about distributed transactions, throw this article to them
  3. Understanding Distributed Transactions and Two-Phase Commit, Three-Phase Commit

A transaction is actually a bounded sequence of operations with a clear start and end marker, which must either be fully executed or completely fail and roll back. A distributed transaction is a transaction that runs in a distributed system, composed of multiple local transactions.

Basic Characteristics of Transactions ACID#

  • Atomicity
  • Consistency
  • Isolation
  • Durability

CAP and BASE Theories#

CAP#

  • Consistency: For a specific client, read operations can return the latest write operations. For data distributed across different nodes, if data is updated on one node, and other nodes can read this latest data, it is called strong consistency. If any node fails to read it, it is considered distributed inconsistency.
  • Availability: Non-faulty nodes return reasonable responses (not error or timeout responses) within a reasonable time. The two keys to availability are reasonable time and reasonable response. Reasonable time means requests cannot be blocked indefinitely and should return within a reasonable time. Reasonable response means the system should clearly return results and the results are correct; for example, it should return 50, not 40.
  • Partition tolerance: The system can continue to operate when a network partition occurs. For example, in a cluster with multiple machines, if one machine has network issues, the cluster can still function normally.

The CAP theorem tells us that a distributed system cannot simultaneously satisfy all three basic requirements: Consistency (C), Availability (A), and Partition tolerance (P), and can satisfy at most two of them. Reference links:

BASE#

The BASE theory includes Basically Available, Soft State, and Eventual Consistency.

  • Basically Available: When a distributed system fails, it allows for the loss of some functionality. For example, during major sales events like e-commerce's 618, some non-core functionalities may be downgraded.
  • Soft State: In flexible transactions, the system is allowed to have intermediate states that do not affect overall system availability. For example, in database read-write separation, the write database synchronizes to the read database (master to slave) with some delay, which is a form of soft state.
  • Eventual Consistency: During the operation of a transaction, inconsistencies may arise due to synchronization delays, but ultimately, the data will be consistent.

BASE addresses the lack of network delay in CAP theory by using soft state and eventual consistency to ensure consistency after delays. BASE is the opposite of ACID; it is fundamentally different from the strong consistency model of ACID, sacrificing strong consistency for availability and allowing data to be inconsistent for a period of time, but ultimately reaching a consistent state.

Methods for Implementing Distributed Transactions#

  1. Two-Phase Commit Protocol based on XA protocol;
  2. Three-Phase Commit Protocol;
  3. TCC Transaction Mechanism;
  4. Local Message Table;
  5. Saga Transaction.

Among these, the Two-Phase Commit Protocol based on XA protocol and the Three-Phase Commit Protocol adopt strong consistency, adhering to ACID, while the message-based eventual consistency methods adopt eventual consistency, adhering to BASE theory.

1. Two-Phase Commit Method Based on XA Protocol (2PC)#

The principle of XA for implementing distributed transactions is similar to centralized algorithms in distributed mutual exclusion, with a transaction manager acting as the coordinator, responsible for committing and rolling back each local resource.

Process#

image.png

First Phase: The transaction manager requests each database involved in the transaction to pre-commit this operation and reflects whether it can be committed.

Second Phase: The transaction coordinator requests each database to commit or roll back the data.

Disadvantages#

  • Synchronous Blocking Issue: The two-phase commit algorithm blocks all participating nodes during execution. This means that when a local resource manager occupies a critical resource, other resource managers trying to access the same critical resource will be in a blocked state.
  • Single Point of Failure Issue: The XA-based two-phase commit algorithm is similar to centralized algorithms; once the transaction manager fails, the entire system comes to a standstill. Especially during the commit phase, if the transaction manager fails, resource managers will lock transaction resources while waiting for messages from the manager, causing the entire system to be blocked.
  • Data Inconsistency Issue: During the commit phase, if a local network anomaly occurs after the coordinator sends a DoCommit request to participants, or if the coordinator fails while sending the commit request, only some participants may receive the commit request and execute the commit operation, while others that did not receive the request cannot execute the transaction commit. This leads to data inconsistency in the entire distributed system.

2. Three-Phase Commit Protocol Method (3PC)#

The Three-Phase Commit Protocol (3PC) is an improvement over the Two-Phase Commit (2PC), introducing a timeout mechanism and a preparation phase:

  • Timeout Mechanism: If the coordinator or participants do not receive responses from other nodes within the specified time, they will choose to commit or abort the entire transaction based on the current state.
  • Preparation Phase: Before the commit phase, a pre-commit phase is added to eliminate some inconsistent situations, ensuring that the states of all participating nodes are consistent before the final commit.

Process#

3PC divides the commit phase of 2PC into two, resulting in three phases: CanCommit, PreCommit, and DoCommit.
image.png

  • CanCommit:
    The CanCommit phase is similar to the voting phase of 2PC: the coordinator sends a request operation (CanCommit request) to participants, asking whether they can perform the transaction commit operation, and then waits for their responses. Participants reply Yes if they can proceed with the transaction; otherwise, they reply No.

  • PreCommit: The coordinator decides whether to proceed with the PreCommit operation based on the responses from participants.

    1. If all participants reply "Yes," the coordinator will execute the pre-execution of the transaction.
    2. If any participant sends a "No" message to the coordinator, or if the coordinator does not receive responses from participants after a timeout, it will abort the transaction.
  • DoCommit: The DoCommit phase performs the actual transaction commit, entering the execution commit phase or transaction abort phase based on the messages sent by the coordinator during the PreCommit phase.

Comparison with 2PC#

  • Compared to 2PC, 3PC sets timeout periods for both the coordinator and participants, while 2PC only has a timeout mechanism for the coordinator. This avoids the issue where participants cannot release resources when they cannot communicate with the coordinator for a long time (if the coordinator fails), as participants will automatically perform a local commit after a timeout, thus releasing resources. This mechanism also indirectly reduces the blocking time and scope of the entire transaction.
  • In addition to the preparation and commit phases of 2PC, a pre-commit phase (PreCommit) is added, which acts as a buffer phase to ensure that the states of all participating nodes are consistent before the final commit phase.

3. TCC Transaction Mechanism#

TCC (Try-Confirm-Cancel) is also known as compensating transactions. Its core idea is: "For each operation, a corresponding confirmation and compensation (rollback operation) must be registered." It consists of three operations:

  • Try Phase: Mainly for detecting the business system and reserving resources.
  • Confirm Phase: Confirming the execution of business operations.
  • Cancel Phase: Cancelling the execution of business operations.

image.png

Applicable Scenarios

  • Activities with strong isolation and strict consistency requirements.
  • Business operations with relatively short execution times.

4. Local Message Table#

In eBay's distributed system architecture, the core idea for solving consistency issues is: to asynchronously execute transactions that need to be processed in a distributed manner through messages or logs. Messages or logs can be stored in local files, databases, or message queues, and then retry failures based on business rules.
Reference: Base: An Acid Alternative

5. Saga Transaction#

Saga is a concept mentioned in a database ethics paper 30 years ago. Its core idea is to split long transactions into multiple local short transactions, coordinated by a Saga transaction coordinator. If it ends normally, it completes normally; if any step fails, it calls the compensation operation in reverse order.

It is important to note that in the saga model, isolation cannot be guaranteed because resources are not locked, and other transactions can still overwrite or affect the current transaction. A solution from Huawei can be referenced: introducing a session and locking mechanism from the business layer to ensure serializable operations on resources. Alternatively, resources can be isolated by pre-freezing funds at the business level, allowing the latest updates to be obtained by timely reading the current state during business operations.

Specific examples: Refer to Huawei's servicecomb.

Expansion#

Rigid Transactions vs. Flexible Transactions#

  • Rigid transactions adhere to the ACID principles and have strong consistency, such as database transactions.
  • Flexible transactions implement eventual consistency using different methods based on different business scenarios, meaning we can make some trade-offs based on business characteristics and tolerate certain periods of data inconsistency.

In summary, unlike rigid transactions, flexible transactions allow for data inconsistency across different nodes for a certain period but require eventual consistency. The eventual consistency of flexible transactions adheres to the BASE theory.

Relationship Between Distributed Mutual Exclusion and Distributed Transactions#

Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.