Yige

Yige

Build

MySQL Summary

MySQL Summary#

I. Four Major Characteristics of Transactions (ACID)#

Atomicity#

Atomicity means that a transaction is an indivisible unit of work; the operations within a transaction either all occur or none occur.

Consistency#

If the database is in a state of integrity before the transaction is executed, then after the transaction ends, regardless of whether the transaction was successful, the database remains in a state of integrity (the integrity state of a database: when all data in a database conforms to all constraints defined in the database, it can be said that the database is in a state of integrity).

Isolation#

The isolation of a transaction means that when multiple users concurrently access the database, one user's transaction cannot be interfered with by other users' transactions; data between multiple concurrent transactions must be isolated from each other.

Durability#

Durability means that once a transaction is committed, the changes it made to the data in the database are permanent, and subsequent database failures should not affect this.

Extensions#

II. Database Isolation Levels#

Read Uncommitted#

Read uncommitted data; at this isolation level, all transactions can see the execution results of other uncommitted transactions. This isolation level is rarely used in practical applications because its performance is not significantly better than other levels. Reading uncommitted data is also known as Dirty Read.

Read Committed#

Read committed data; this is the default isolation level for most database systems (but not the default for MySQL). It satisfies the simple definition of isolation: a transaction can only see changes made by committed transactions. This isolation level may lead to what is known as Nonrepeatable Read, as other instances of the same transaction may have new commits while this instance is processing, so the same select may return different results.

Repeatable Read#

This is the default transaction isolation level for MySQL, ensuring that multiple instances of the same transaction see the same rows of data when reading concurrently. However, theoretically, this can lead to another tricky problem: Phantom Read.

Serializable#

Serialization; this is the highest isolation level, which prevents conflicts between transactions by forcing them to be ordered. In simple terms, it adds shared locks to each row of data read. At this level, it may lead to a large number of timeouts and lock contention.

Dirty ReadNonrepeatable ReadPhantom Read
Read Uncommitted✔️✔️✔️
Read Committed✖️✔️✔️
Repeatable Read✖️✖️✔️
Serializable✖️✖️✖️

Dirty Read#

If one transaction has updated a piece of data, and another transaction reads the same piece of data at that time, if for some reason the first transaction rolls back, then the data read by the second transaction will be incorrect.

Phantom Read#

In two queries of a transaction, the number of data entries is inconsistent. For example, one transaction queries several rows of data, while another transaction inserts new rows of data at that time. The previous transaction will find that there are several rows of data that it did not have in the subsequent query. The InnoDB and Falcon storage engines solve this problem through a Multi-Version Concurrency Control (MVCC) mechanism.

III. MySQL Lock Mechanism#

Reference link: MySQL Lock Summary
Associative memory: Happens Before semantics in Java (volatile keyword).

Shared Lock and Exclusive Lock#

  • Shared Lock (Read Lock): Other transactions can read but cannot write.
  • Exclusive Lock (Write Lock): Other transactions cannot read or write.

Lock Granularity#

  • Table-Level Lock

    1. Low overhead, quick to lock; deadlocks do not occur; large locking granularity, highest probability of lock conflicts, and lowest concurrency.
    2. Table-level locks are more suitable for query-oriented applications with few concurrent users and only a small amount of data updated by index conditions, such as web applications.
    3. These storage engines avoid deadlocks by always acquiring all necessary locks at once and always acquiring table locks in the same order.
  • Row-Level Lock

    1. High overhead, slow to lock; deadlocks can occur; smallest locking granularity, lowest probability of lock conflicts, and highest concurrency.
    2. Maximally supports concurrency while also bringing the highest lock overhead.
    3. Row-level locks are implemented only at the storage engine level, while the MySQL server layer does not implement them. Row-level locks are more suitable for applications with a large number of concurrent updates to a small amount of different data, along with concurrent queries, such as some online transaction processing (OLTP) systems.
  • Page Lock: The overhead and locking time are between table locks and row locks; deadlocks can occur; locking granularity is between table locks and row locks, with general concurrency.

By default, both table locks and row locks are automatically obtained without additional commands. However, in some cases, users need to explicitly lock tables or control transactions to ensure the integrity of the entire transaction, which requires using transaction control and locking statements.

Various Engine Locks#

  • MyISAM and MEMORY storage engines use table-level locking.
  • BDB storage engine uses page-level locking, but also supports table-level locking.
  • InnoDB storage engine supports both row-level locking and table-level locking, but defaults to row-level locking.
    In InnoDB, locks are acquired incrementally except for transactions composed of a single SQL statement, which means that deadlocks can occur in InnoDB.
    

MyISAM Table Locks#

MyISAM Table-Level Lock Modes#

  • Table Shared Read Lock: Does not block other users' read requests to the same table but blocks write requests to the same table.

  • Table Exclusive Write Lock: Blocks other users' read and write operations on the same table.

    Read operations and write operations on MyISAM tables are serial. When a thread acquires a write lock on a table, only the thread holding the lock can update the table. Other threads' read and write operations will wait until the lock is released.

    By default, write locks have a higher priority than read locks. When a lock is released, it will be prioritized for the write lock queue waiting for the lock request, and then for the read lock queue waiting for the lock request.

    This is also the reason why MyISAM tables are not suitable for applications with a large number of update and query operations, as a large number of update operations can make it difficult for query operations to obtain read locks, potentially causing indefinite blocking.

MyISAM Adding Table Lock Method#

In the case of automatic locking, MyISAM always acquires all the locks needed for the SQL statement at once, which is why MyISAM tables do not experience deadlocks (Deadlock Free).

The MyISAM storage engine supports concurrent inserts to reduce contention between read and write operations on a given table:

If there are no free blocks in the middle of the MyISAM table's data file, rows are always inserted at the end of the data file. In this case, you can freely mix concurrent INSERT and SELECT statements on the MyISAM table without needing to lock it—you can insert rows into the MyISAM table while other threads are reading. Free blocks in the middle of the file may arise from deleted or updated rows. If there are free blocks in the middle of the file, concurrent inserts will be disabled, but when all free blocks are filled with new data, it will automatically re-enable. To control this behavior, you can use MySQL's concurrent_insert system variable:

  • When concurrent_insert is set to 0, concurrent inserts are not allowed.
  • When concurrent_insert is set to 1, if there are no holes in the MyISAM table (i.e., no deleted rows in the middle of the table), MyISAM allows one thread to read the table while another thread inserts records from the end of the table. This is also the default setting for MySQL.
  • When concurrent_insert is set to 2, concurrent inserts are allowed at the end of the table regardless of whether there are holes in the MyISAM table.

Querying Table-Level Lock Contention#

You can analyze table lock contention on the system by checking the table_locks_waited and table_locks_immediate status variables. If the value of Table_locks_waited is high, it indicates a serious table-level lock contention issue:

mysql> SHOW STATUS LIKE 'Table%';
+-----------------------+---------+
| Variable_name | Value |
+-----------------------+---------+
| Table_locks_immediate | 1151552 |
| Table_locks_waited | 15324 |
+-----------------------+---------+

InnoDB Row-Level Locks and Table-Level Locks#

InnoDB Lock Modes#

InnoDB implements the following two types of row locks:

  • Shared Lock (S): Allows one transaction to read a row, preventing other transactions from obtaining exclusive locks on the same dataset.
  • Exclusive Lock (X): Allows the transaction holding the exclusive lock to update data, preventing other transactions from obtaining shared read locks and exclusive write locks on the same dataset.

To allow row locks and table locks to coexist, InnoDB also has two types of intention locks (Intention Locks) used internally, both of which are table locks:

  • Intention Shared Lock (IS): The transaction intends to add a shared lock to a data row; the transaction must first acquire the IS lock on the table before adding a shared lock to a data row.
  • Intention Exclusive Lock (IX): The transaction intends to add an exclusive lock to a data row; the transaction must first acquire the IX lock on the table before adding an exclusive lock to a data row.

InnoDB Locking Method#

  • Intention locks are automatically added by InnoDB and do not require user intervention.
  • For UPDATE, DELETE, and INSERT statements, InnoDB automatically adds exclusive locks (X) to the involved datasets.
  • For ordinary SELECT statements, InnoDB does not add any locks.
  • Transactions can explicitly add shared or exclusive locks to record sets using the following statements:
    (1) Shared Lock (S): SELECT * FROM table_name WHERE ... LOCK IN SHARE MODE
    (2) Exclusive Lock (X): SELECT * FROM table_name WHERE ... FOR UPDATE.
    
Implicit Locking#

InnoDB uses a two-phase locking protocol (associative memory to 2pc protocol) during transaction execution:

  • Locking can be executed at any time; InnoDB will automatically add locks as needed based on the isolation level.
  • Locks are only released when a commit or rollback is executed, and all locks are released simultaneously.
Explicit Locking#
select ... lock in share mode // Shared Lock
select ... for update // Exclusive Lock
  • select for update
    1. When executing this select query, it will apply an exclusive lock (X lock) to the corresponding index access entry, meaning that the lock corresponding to this statement is equivalent to the effect of an update.
    2. The usage scenario for select *** for update: To ensure that the data retrieved is the latest data and that only the current transaction is allowed to modify the retrieved data, the for update clause is needed.
    3. Other sessions can query the record but cannot apply shared or exclusive locks on it; they must wait to acquire the lock.
  • select lock in share mode
    1. The in share mode clause adds a share lock to the retrieved data, indicating that other transactions can only perform simple select operations on this data and cannot perform DML operations.

    2. To ensure that the retrieved data has not been modified by other transactions, meaning that the retrieved data is the latest data, and that other transactions are not allowed to modify the data. However, the current transaction may not be able to modify the data because other transactions may have also applied an S lock to this data using in share mode.

    3. Other sessions can still query the record and can also apply a share mode shared lock on that record. However, if the current transaction needs to update that record, it may likely cause a deadlock.

    • The difference between the two: for update is an exclusive lock (X lock); once a transaction acquires this lock, other transactions cannot perform operations on this data; lock in share mode is a shared lock, allowing multiple transactions to perform operations on the same data simultaneously.

InnoDB Row Lock Implementation#

  • InnoDB row locks are implemented by locking the index entries on the index, which is different from Oracle, which locks the corresponding data rows in the data block. This characteristic of InnoDB's row lock implementation means that InnoDB only uses row-level locks when retrieving data through index conditions; otherwise, InnoDB will use table locks!
  • Whether using primary key indexes, unique indexes, or ordinary indexes, InnoDB will use row locks to lock data.
  • Row locks can only be used when the execution plan actually uses the index: even if index fields are used in the conditions, whether the index is used to retrieve data is determined by MySQL based on the cost of different execution plans. If MySQL determines that a full table scan is more efficient, such as for very small tables, it will not use the index. In this case, InnoDB will use table locks instead of row locks. Therefore, when analyzing lock conflicts,
  • Since MySQL's row locks are locks on indexes, not on records, even if multiple sessions access different records, if they use the same index key, lock conflicts will occur (subsequent sessions using these indexes will need to wait for the session that first used the index to release the lock before they can acquire the lock).

MySQL Lock Algorithm#

  • Record Lock: Lock on a single row record.
  • Gap Lock: A gap lock locks a range but does not include the record itself. The purpose of GAP locks is to prevent phantom reads from occurring during two current reads of the same transaction.
  • Next-Key Lock: Record + Gap, locks a range and the record itself. This method is used for row queries, primarily to solve the phantom read problem.
  • Stop talking, what locks do various SQL statements actually add?

IV. MySQL's MVCC Mechanism#

What is the MVCC Mechanism?#

It involves adding two hidden columns to each row record, recording the creation version number and deletion version number, while each transaction has a unique incrementing version number when it starts. In InnoDB, two hidden fields are added to each row to implement MVCC, both used to store the version number of the transaction, and the version number of the transaction increments with each new transaction started.

Consistent Non-Locking Read#

Consistent read allows InnoDB to provide a snapshot of the database at a certain point in time using multiple versions. If the isolation level is REPEATABLE READ, then all consistent reads in the same transaction read the snapshot obtained from the first such read in that transaction; if it is READ COMMITTED, then each consistent read in a transaction will read its own refreshed snapshot version. Consistent read is the default mode for ordinary SELECT statements under READ COMMITTED and REPEATABLE READ isolation levels. Consistent read does not add any form of locks to the tables it accesses, allowing other transactions to concurrently modify them.

Solving Nonrepeatable Reads#

When an MVCC database needs to update a data record, it does not directly overwrite the old data with new data but marks the old data as obsolete and adds the new version of the data elsewhere. This results in multiple versions of the data being stored, but only one is the latest. This approach allows readers to read data that existed before they read it, even if that data was modified or deleted by others during the read process, without affecting users who were reading earlier. It ensures that multiple reads of the same data within the same transaction return the same result, solving the nonrepeatable read problem.

The downside is:
This multi-version approach avoids the overhead of filling gaps caused by delete operations in memory and disk storage structures but requires the system to periodically clean up (sweep through) to truly delete old, obsolete data.

In summary: MVCC is a way to temporarily retain multiple versions of the same data to achieve concurrency control.

  • Learn about the MVCC mechanism through etcd: etcd

V. MySQL Storage Engines#

InnoDB#

Introduction#

  1. Supports ACID transactions and the four isolation levels of transactions;
  2. Supports row-level locks and foreign key constraints; therefore, it can support write concurrency;
  3. Does not store the total number of rows;
  4. An InnoDB engine is stored in a file space (shared tablespace, table size is not controlled by the operating system, a table may be distributed across multiple files), and it may also be multiple (set as independent tablespace, table size is limited by the operating system file size, generally 2G), limited by the operating system file size;
  5. Primary key indexes use clustered indexes (the index's data field stores the data file itself), and the data field of auxiliary indexes stores the primary key values; therefore, to find data from auxiliary indexes, you need to first find the primary key value through the auxiliary index and then access the auxiliary index;
  6. It is best to use auto-increment primary keys to prevent large adjustments to the file structure to maintain the B+ tree structure during data insertion;
  7. Suitable for OLTP (Online Transaction Processing), with high real-time requirements.

Main Features#

Insert buffer, double write, adaptive hash index, asynchronous IO, flush neighbor page.

References#

MyISAM#

  1. Does not support transactions, but each query is atomic;
  2. Supports table-level locks, meaning each operation locks the entire table;
  3. Stores the total number of rows in the table;
  4. A MyISAM table consists of three files: index file, table structure file, and data file;
  5. Uses non-clustered indexes, where the data field of the index file stores pointers to the data file. The auxiliary index is basically the same as the primary index, but the auxiliary index does not need to guarantee uniqueness.
  6. Suitable for OLAP (Online Analytical Processing), with lower real-time requirements but generally large data volumes.

MEMORY#

ARCHIVE#

VI. MySQL Indexes#

The main types are B+ indexes and hash indexes, with differences:#

  1. If it is an equality query, then hash indexes have a clear advantage, as they only need to go through the algorithm once to find the corresponding key value; of course, this premise is that the key values are unique. If the key values are not unique, you need to first find the position of the key and then scan through the linked list until the corresponding data is found.
  2. If it is a range query, hash indexes are useless, as the originally ordered key values may become discontinuous after hashing, making it impossible to utilize the index for range query retrieval.
  3. Similarly, hash indexes cannot be used for sorting or partial fuzzy queries like 'xxx%'. (This type of partial fuzzy query is essentially also a range query.)
  4. Hash indexes do not support the leftmost matching rule for multi-column composite indexes.
  5. The keyword retrieval efficiency of B+ tree indexes is relatively average, unlike B trees, which have large fluctuations. In cases with a large number of duplicate key values, the efficiency of hash indexes is also very low due to the so-called hash collision problem.

B+ Index Data Structure and Differences from B-Tree#

  1. B-Tree: Ordered Array + Balanced Multi-Way Tree
    Its characteristics are:
    (1) It is no longer a binary search but an m-way search;
    (2) Both leaf nodes and non-leaf nodes store data;
    (3) In-order traversal can obtain all nodes;

  2. B+ Tree: Ordered Array Linked List + Balanced Multi-Way Tree, with some improvements over B-Tree
    (1) Non-leaf nodes no longer store data; data is only stored in the leaf nodes at the same level.
    (2) A linked list is added between the leaves, so obtaining all nodes no longer requires in-order traversal.

  3. Improved Characteristics of B+ Tree
    (1) For range searches, after locating min and max, the intermediate leaf nodes are the result set, eliminating the need for in-order backtracking.
    (2) Leaf nodes store actual record rows, with record rows stored relatively tightly, suitable for large data volume disk storage; non-leaf nodes store the PK for query acceleration, suitable for memory storage.
    (3) If non-leaf nodes do not store actual records but only store the record's KEY, then under the same memory conditions, B+ trees can store more indexes.

Why B+ Trees are Suitable as Index Structures#

  1. Unlike binary search trees, B trees are m-way, significantly reducing tree height, allowing for the storage of large amounts of data.
  2. Very suitable for disk storage, fully utilizing the principle of locality and disk pre-reading.
    (1) Memory read/write blocks are slow compared to disk reads/writes, and much slower;
    (2) Disk pre-reading: Disk reads are not done on demand but are pre-read by pages, reading a page (4K of data) at a time. If the data to be read in the future is in this page, it can avoid future disk IO, improving efficiency.
    (3) The principle of locality: Software design should follow the principle of "data reading concentration" and "if one data is used, it is likely that nearby data will also be used," allowing disk pre-reading to significantly improve disk IO.
  3. Both MyISAM and InnoDB use B+ trees as their index storage structure, but the storage methods for data on the leaves differ. The former separates index files and data files, with the index file only saving pointers to the pages where records are located (physical locations), while the latter directly stores data or stores primary key values (retrieving auxiliary indexes using primary key values, which actually performs a secondary query, increasing IO times).

Index Classification#

  • Normal Index: The most basic index with no restrictions.
  • Unique Index: Similar to "normal index," but the values of the indexed columns must be unique, allowing for null values.
  • Primary Key Index: A special type of unique index that does not allow null values.
  • Full-Text Index: Only available for MyISAM tables; generating a full-text index for large data is time-consuming and space-consuming. (MATCH... AGAINST...)
  • Composite Index: To further improve MySQL efficiency, composite indexes can be established, following the "leftmost prefix" principle.
  • Covering Index: An index that includes (covers) the values of all fields that need to be queried.

Explain Simulated SQL Query Plan#

Information Contained in Explain Execution Plan#

idselect_typetabletypepossible_keyskeykey_lenrefrowsExtra

Detailed Corresponding Meanings#

  • id: The sequence number of the select query, containing a group of numbers representing the order of executing select clauses or operating tables in the query.

  • select_type: The type of query, divided into:

    1. SIMPLE: A simple select query that does not contain subqueries or unions.
    2. PRIMARY: The outermost query marked as primary if it contains any complex sub-parts.
    3. SUBQUERY: Contains a subquery in the select or where list.
    4. DERIVED: A subquery in the from list marked as derived, MySQL recursively executes these subqueries and puts the results in a temporary table.
    5. UNION: If a second select appears after a union, it is marked as union; if the union is contained in a subquery in the from clause, the outer select will be marked as derived.
    6. UNION RESULT: The select that retrieves results from the union table.
  • type: Access type, a very important metric in SQL query optimization, with values ranked from good to bad as follows:
    system > const > eq_ref > ref > fulltext > ref_or_null > index_merge > unique_subquery > index_subquery > range > index > ALL
    Generally, a good SQL query should at least reach the range level, preferably ref.

    1. system: The table has only one record (equal to a system table), which is a special case of const and usually does not occur, can be ignored.
    2. const: Indicates that it found the record through the index in one go; const is used for comparing primary keys or unique indexes. Since only one row of data needs to be matched, it is very fast. If the primary key is placed in the where list, MySQL can convert that query into a const.
    3. eq_ref: Unique index scan; for each index key, there is only one record in the table that matches it. Commonly seen in primary key or unique index scans.
    4. ref: Non-unique index scan; returns all rows matching a single value. Essentially, it is also a type of index access, returning all rows matching a single value, but it may find multiple matching rows, so it should be considered a hybrid of lookup and scan.
    5. range: Only retrieves rows within a given range, using an index to select rows. The key column shows which index was used. Generally, this occurs when the where clause includes between, <, >, in, etc.
    6. index: Full Index Scan; the difference between index and ALL is that index only traverses the index tree. This is usually an ALL block, as index files are typically smaller than data files. (Although both Index and ALL read the entire table, index reads from the index while ALL reads from the disk.)
    7. ALL: Full Table Scan; traverses the entire table to find matching rows.
  • possible_keys: If there are indexes on the fields involved in the query, those indexes will be listed, but they may not necessarily be used in the query.

  • key: The actual index used; if NULL, no index was used. If a covering index was used in the query, that index will only appear in the key list.

  • key_len: Indicates the number of bytes used in the index, the length of the index used in the query (maximum possible length), and not the actual length used; theoretically, the shorter the length, the better. key_len is calculated based on the table definition, not retrieved from the table.

  • ref: Shows which column of the index was used, if possible, as a constant const.

  • rows: Based on table statistics and index selection, it roughly estimates the number of rows that need to be read to find the required records.

  • Extra: Additional information that is not suitable to be displayed in other fields but is very important.

    1. Using filesort: MySQL uses an external index to sort data rather than sorting using the index within the table. This means MySQL cannot utilize the index to complete the sorting operation, resulting in "file sorting."
    2. Using temporary: A temporary table is used to save intermediate results, meaning MySQL used a temporary table when sorting the query results, commonly seen in order by and group by.
    3. Using index: Indicates that the corresponding select operation used a covering index (Covering Index), avoiding access to the data rows of the table, thus improving efficiency; if it appears simultaneously with Using where, it indicates that the index was used to perform a lookup of the index key values; if it does not appear simultaneously, it indicates that the index was used to read data rather than perform a lookup.
    4. Using Where: Indicates that where filtering was used.
    5. Using join buffer: Indicates that a join buffer was used.
    6. Impossible WHERE: The values in the where clause are always false, and cannot be used to retrieve any tuples.
    7. select tables optimized away: In the absence of a group by clause, it optimizes MIN/MAX operations based on indexes or optimizes COUNT(*) operations for MyISAM storage engines, completing the optimization during the query execution plan generation phase rather than waiting for the execution phase.
    8. distinct: Optimizes distinct operations, stopping the search for the same value after finding the first matching tuple.

Differences Between Clustered Index and Non-Clustered Index#

Clustered Index#

Definition: The physical order of data rows is the same as the logical order of column values (usually the column of the primary key); a table can only have one clustered index. If no primary key is defined, a unique non-null index will be chosen as a substitute; if no such index exists, a primary key will be implicitly defined as the clustered index.

Non-Clustered Index#

Definition: The logical order of the index differs from the physical storage order of rows on disk; a table can have multiple non-clustered indexes.

References#

VII. Master-Slave Replication in Databases#

  • Even if MySQL is split into multiple instances, there must be a master and a slave; all write operations must be completed on the master MySQL.
  • All data on the slave MySQL comes from (is synchronized with) the master MySQL.
  • In MySQL master-slave scenarios, if a business (a method in a service) has both R operations and W operations, since W operations must be performed on the master MySQL, all data must come from a single transaction.

Extensions#

How to Optimize Long Delays in MySQL Master-Slave

VIII. Normalization Design#

  • The First Normal Form (1NF) is a basic requirement for relational schemas; a database that does not satisfy the first normal form (1NF) is not a relational database, meaning that each column in the database table must be an indivisible basic data item, and the same column cannot have multiple values.
  • The Second Normal Form (2NF) requires that each instance or row in the database table must be uniquely distinguishable. That is, there is no partial dependency between fields and the primary key.
  • The Third Normal Form (3NF) requires that a database table does not contain non-primary key information already included in other tables. That is, based on the second normal form, there should be no transitive dependencies (redundant data is not allowed).

IX. Extensions#

Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.