HBase Series - HBase Read and Write Process

HBase Series - HBase Read and Write Process#

Content organized from:

HBase - Data Writing Process Analysis

HBase Principles - Data Reading Process Analysis

Data Writing Process#

Client-Side Writing Process#

After the user submits a put request, the HBase client will determine whether to directly submit it to the server for processing based on the setting autoflush=true/false (default is true). If it is false, the request will be added to the local buffer, and will only be submitted after exceeding a certain threshold (default is 2M, configurable via configuration files). This can improve write performance, but there is a risk of request loss due to client crashes.
Before submission, HBase will find the corresponding region server based on the rowkey in the meta metadata table. This locating process is obtained through the HConnection's locateRegion method. If it is a batch request, these rowkeys will also be grouped according to HRegionLocation, with each group corresponding to one RPC request.
HBase will construct a remote RPC request MultiServerCallable<Row> for each HRegionLocation, and then execute the call through rpcCallerFactory.<MultiResponse> newCaller(), ignoring failures and error handling for resubmission. The client's submission operation ends here.

Server-Side Writing Process#

After the server-side Region Server receives the client's write request, it will first deserialize it into a Put object, then check operations such as whether the region is read-only, whether the memstore size exceeds blockingMemstoreSize, and then perform the following core operations:

Acquire row lock, Region update shared lock: HBase uses row locks to ensure that updates to the same row of data are mutually exclusive operations, ensuring the atomicity of updates—either the update succeeds or it fails.
Start write transaction: Obtain write number for implementing MVCC, enabling non-locking reads of data while improving read performance under the premise of ensuring read-write consistency.
Write to memstore cache: Each column family in HBase corresponds to a store for storing that column's data. Each store has a write cache memstore for caching written data. HBase does not directly write data to disk but first writes it to the cache, and only writes to disk when the cache reaches a certain size.
Append HLog: HBase uses the WAL mechanism to ensure data reliability, meaning it first writes to the log and then to the cache. Even in the event of a crash, the original data can be restored by recovering the HLog. This step constructs the data into a WALEdit object and sequentially writes it into the HLog, without needing to execute a sync operation. Version 0.98 adopted a new write thread model to implement HLog logging, greatly enhancing overall data update performance.
Release row lock and shared lock
Sync HLog: HLog is truly synced to HDFS. Executing the sync operation after releasing the row lock is to minimize lock holding time and improve write performance. If the sync fails, a rollback operation will remove the data already written in memstore.
End write transaction: At this point, the update operation of this thread will be visible to other read requests, and the update will take effect.
Flush memstore: When the write cache reaches 64M, a flush thread will be activated to refresh the data to the hard disk. When the memstore data in HBase is flushed to disk, a storefile is created. When the number of storefiles reaches a certain level, a compaction operation is needed on the storefile files. The purpose of Compact: merge files, clear expired and redundant version data, improve read and write efficiency.

WAL Persistence Levels#

SKIP_WAL: Only write to cache, not to HLog. This provides good performance but risks data loss; not recommended.
ASYNC_WAL: Asynchronously write data to HLog.
SYNC_WAL: Synchronously write data to the log file. It should be noted that data is only written to the file system and not actually persisted to disk.
FSYNC_WAL: Synchronously write data to the log file and force it to disk. This is the strictest log writing level, ensuring data is not lost, but performance is relatively poor.
USER_DEFAULT: By default, if the user does not specify a persistence level, HBase uses the SYNC_WAL level to persist data.

Users can set the WAL persistence level through the client, code: put.setDurability(Durability.SYNC_WAL);

Reading Process#

The process for the client to read and write data on HBase for the first time:

The client obtains the Region Server where the META table is located from Zookeeper;
The client accesses the Region Server where the META table is located, queries the Region Server for the access row key from the META table, and then caches this information along with the location of the META table;
The client retrieves data from the Region Server where the row key is located.

If reading again, the client will obtain the Region Server for the row key from the cache. This way, the client does not need to query the META table again unless the region moves, causing the cache to become invalid. In that case, it will re-query and update the cache.

Note: The META table is a special table in HBase that stores the location information of all regions, while the location information of the META table itself is stored in ZooKeeper.

-w816

For a more detailed reading data process, refer to:

HBase Principles - Data Reading Process Analysis

HBase Principles - Late 'Data Reading Process Partial Details

HBase Query Methods#

Full table query: scan tableName
Single row query based on rowkey: get tableName, '1'
Range scan based on rowkey: scan tableName, {STARTROW=>'1', STOPROW=>'2'}

setCache() and setBatch() methods
Cache sets the number of rows returned by the server at once, while Batch sets the number of columns returned by the server at once.