Yige

Yige

Build

HBase Series - HBase Read and Write Process

HBase Series - HBase Read and Write Process#

Content organized from:

Data Write Process#

Client Write Process#

  1. After the user submits a put request, the HBase client will determine whether to submit it directly to the server for processing based on the setting autoflush=true/false (default is true). If it is false, the request will be added to the local buffer, and will only be submitted after exceeding a certain threshold (default is 2M, configurable via the configuration file). This can improve write performance, but there is a risk of request loss due to client crashes.

  2. Before submission, HBase will find the corresponding region server based on the rowkey in the metadata table.meta.. This locating process is obtained through the HConnection's locateRegion method. If it is a batch request, it will also group these rowkeys according to HRegionLocation, with each group corresponding to one RPC request.

  3. HBase will construct a remote RPC request MultiServerCallable<Row> for each HRegionLocation, and then execute the call through rpcCallerFactory.<MultiResponse> newCaller(), ignoring failure resubmissions and error handling. The client’s submission operation ends here.

Server Write Process#

After the server-side Region Server receives the client's write request, it will first deserialize it into a Put object, then check operations such as whether the region is read-only, whether the memstore size exceeds blockingMemstoreSize, and then perform the following core operations:
image.png

  1. Acquire row lock and Region update shared lock: HBase uses row locks to ensure that updates to the same row of data are mutually exclusive operations, ensuring the atomicity of updates—either the update succeeds or fails.

  2. Start write transaction: Acquire write number to implement MVCC, achieving non-locking reads of data, improving read performance while ensuring read-write consistency.

  3. Write to memstore: Each column family in HBase corresponds to a store used to store that column's data. Each store has a write cache memstore for caching written data. HBase does not directly write data to disk but first writes it to the cache, and only flushes it to disk when the cache reaches a certain size.

  4. Append HLog: HBase uses the WAL mechanism to ensure data reliability, meaning it writes logs before writing to the cache. Even in the event of a crash, the original data can be restored by recovering the HLog. This step constructs the data into a WALEdit object and sequentially writes it into the HLog, without needing to execute a sync operation. Version 0.98 adopted a new write thread model to implement HLog logging, greatly enhancing overall data update performance.

  5. Release row lock and shared lock

  6. Sync HLog: HLog is truly synced to HDFS. The sync operation is performed after releasing the row lock to minimize lock holding time and improve write performance. If the sync fails, a rollback operation will remove the data already written in the memstore.

  7. End write transaction: At this point, the update operation of this thread will be visible to other read requests, and the update will take effect.

  8. Flush memstore: When the write cache reaches 64M, a flush thread will be started to refresh the data to the hard disk. When the memstore data in HBase is flushed to disk, a storefile is created. When the number of storefiles reaches a certain level, a compaction operation is needed on the storefile files. The purpose of Compaction: to merge files, clear expired and redundant version data, and improve read and write efficiency.

WAL Persistence Levels#

  • SKIP_WAL: Only write to cache, not to HLog. This uses only memory, providing good performance but is prone to data loss; not recommended.
  • ASYNC_WAL: Asynchronously write data to HLog.
  • SYNC_WAL: Synchronously write data to log files. It is important to note that data is only written to the file system and has not truly been flushed to disk.
  • FSYNC_WAL: Synchronously write data to log files and force a flush to disk. This is the strictest log writing level, ensuring data will not be lost, but performance is relatively poor.
  • USER_DEFAULT: If the user does not specify a persistence level, HBase uses the SYNC_WAL level to persist data.

Users can set the WAL persistence level through the client, with the code: put.setDurability(Durability.SYNC_WAL);

Read Process#

The process of the client reading and writing data on HBase for the first time:

  1. The client retrieves the Region Server where the META table is located from Zookeeper.
  2. The client accesses the Region Server where the META table is located, queries the Region Server for the access row key from the META table, and then caches this information along with the location of the META table.
  3. The client retrieves data from the Region Server where the row key is located.

If read again, the client will obtain the Region Server for the row key from the cache. This way, the client does not need to query the META table again unless the Region moves, causing the cache to become invalid. In that case, it will re-query and update the cache.

Note: The META table is a special table in HBase that stores the location information of all Regions, and the location information of the META table itself is stored in ZooKeeper.

-w816

For a more detailed reading data process, refer to:

HBase Principles - Data Read Process Analysis

HBase Principles - Delayed 'Data Read Process Partial Details

HBase Query Methods#

  • Full table query: scan tableName

  • Single row query based on rowkey: get tableName, '1'

  • Range scan based on rowkey: scan tableName, {STARTROW=>'1', STOPROW=>'2'}

setCache() and setBatch() methods
Cache sets the number of rows returned by the server at once, while Batch sets the number of columns returned by the server at once.

Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.