17.3 InnoDB 多版本控制
InnoDB
is a multi-version storage engine. It keeps information about old versions of changed rows to support transactional features such as concurrency and rollback. This information is stored in undo tablespaces in a data structure called a rollback segment. See Section 17.6.3.4, “Undo Tablespaces”. InnoDB
uses the information in the rollback segment to perform the undo operations needed in a transaction rollback. It also uses the information to build earlier versions of a row for a consistent read. See Section 17.7.2.3, “Consistent Nonlocking Reads”.
Internally, InnoDB
adds three fields to each row stored in the database:
-
A 6-byte
DB_TRX_ID
field indicates the transaction identifier for the last transaction that inserted or updated the row. Also, a deletion is treated internally as an update where a special bit in the row is set to mark it as deleted. -
A 7-byte
DB_ROLL_PTR
field called the roll pointer. The roll pointer points to an undo log record written to the rollback segment. If the row was updated, the undo log record contains the information necessary to rebuild the content of the row before it was updated. -
A 6-byte
DB_ROW_ID
field contains a row ID that increases monotonically as new rows are inserted. IfInnoDB
generates a clustered index automatically, the index contains row ID values. Otherwise, theDB_ROW_ID
column does not appear in any index.
Undo logs in the rollback segment are divided into insert and update undo logs. Insert undo logs are needed only in transaction rollback and can be discarded as soon as the transaction commits. Update undo logs are used also in consistent reads, but they can be discarded only after there is no transaction present for which InnoDB
has assigned a snapshot that in a consistent read could require the information in the update undo log to build an earlier version of a database row. For additional information about undo logs, see Section 17.6.6, “Undo Logs”.
It is recommend that you commit transactions regularly, including transactions that issue only consistent reads. Otherwise, InnoDB
cannot discard data from the update undo logs, and the rollback segment may grow too big, filling up the undo tablespace in which it resides. For information about managing undo tablespaces, see Section 17.6.3.4, “Undo Tablespaces”.
The physical size of an undo log record in the rollback segment is typically smaller than the corresponding inserted or updated row. You can use this information to calculate the space needed for your rollback segment.
In the InnoDB
multi-versioning scheme, a row is not physically removed from the database immediately when you delete it with an SQL statement. InnoDB
only physically removes the corresponding row and its index records when it discards the update undo log record written for the deletion. This removal operation is called a purge, and it is quite fast, usually taking the same order of time as the SQL statement that did the deletion.
If you insert and delete rows in smallish batches at about the same rate in the table, the purge thread can start to lag behind and the table can grow bigger and bigger because of all the “dead” rows, making everything disk-bound and very slow. In such cases, throttle new row operations, and allocate more resources to the purge thread by tuning the innodb_max_purge_lag
system variable. For more information, see Section 17.8.9, “Purge Configuration”.
Multi-Versioning and Secondary 索引
InnoDB
multiversion concurrency control (MVCC) treats secondary indexes differently than clustered indexes. Records in a clustered index are updated in-place, and their hidden system columns point undo log entries from which earlier versions of records can be reconstructed. Unlike clustered index records, secondary index records do not contain hidden system columns nor are they updated in-place.
When a secondary index column is updated, old secondary index records are delete-marked, new records are inserted, and delete-marked records are eventually purged. When a secondary index record is delete-marked or the secondary index page is updated by a newer transaction, InnoDB
looks up the database record in the clustered index. In the clustered index, the record's DB_TRX_ID
is checked, and the correct version of the record is retrieved from the undo log if the record was modified after the reading transaction was initiated.
If a secondary index record is marked for deletion or the secondary index page is updated by a newer transaction, the covering index technique is not used. Instead of returning values from the index structure, InnoDB
looks up the record in the clustered index.
However, if the index condition pushdown (ICP) optimization is enabled, and parts of the WHERE
condition can be evaluated using only fields from the index, the MySQL server still pushes this part of the WHERE
condition down to the storage engine where it is evaluated using the index. If no matching records are found, the clustered index lookup is avoided. If matching records are found, even among delete-marked records, InnoDB
looks up the record in the clustered index.