25.7.11 NDB Cluster 复制 Using the Multithreaded Applier
NDB replication in NDB 8.4 supports the use of the generic MySQL Server Multithreaded Applier mechanism (MTA), which allows independent binary log transactions to be applied in parallel on a replica, increasing peak replication throughput.
The MySQL Server MTA implementation delegates the processing of separate binary log transactions to a pool of worker threads (whose size is configurable), and coordinates the worker threads to ensure that transaction dependencies encoded in the binary log are respected, and that commit ordering is maintained if required (see Section 19.2.3, “Replication Threads”). To use this functionality with NDB Cluster, it is necessary that the replica be configured to use multiple worker threads. To do this, set replica_parallel_workers
to control the number of worker threads on the replica. The default is 4.
If set on the source mysqld, replica_parallel_type
must be LOGICAL_CLOCK
(the default value).
NDB
does not support replica_parallel_type=DATABASE
.
In addition, it is recommended that you set the amount of memory used to track binary log transaction writesets on the source (binlog_transaction_dependency_history_size
) to
, where E
* P
E
is the average epoch size (as the number of operations per epoch) and P
is the maximum expected parallelism. See Writeset Tracking Memory Usage, for more information.
Replica mysqld configuration for the NDB
MTA requires that replica_parallel_workers
is greater than 1. The recommended starting value when first enabling MTA is 4, which is the default.
In addition, replica_preserve_commit_order
must be ON
. This is also the default value.
Transaction dependencies are detected using analysis of each transaction's writeset, that is, the set of rows (table, key values) written by the transaction. Where two transactions modify the same row they are considered to be dependent, and must be applied in order (in other words, serially) to avoid deadlocks or incorrect results. Where a table has secondary unique keys, these values are also added to the transaction's writeset to detect the case where there are transaction dependencies implied by different transactions affecting the same unique key value, and so requiring ordering. Where dependencies cannot be efficiently determined, mysqld falls back to considering transactions dependent for reasons of safety.
Transaction dependencies are encoded in the binary log by the source mysqld. Dependencies are encoded in an ANONYMOUS_GTID
event using a scheme called 'Logical clock'. (See Section 19.1.4.1, “Replication Mode Concepts”.)
The writeset implementation employed by MySQL (and NDB Cluster) uses hash-based conflict detection based on matching 64-bit row hashes of relevant table and index values. This detects reliably when the same key is seen twice, but can also produce false positives if different table and index values hash to the same 64-bit value; this may result in artificial dependencies which can reduce the available parallelism.
Transaction dependencies are forced by any of the following:
-
DDL statements
-
Binary log rotation or encountering binary log file boundaries
-
Writeset history size limitations
-
Writes which reference parent foreign keys in the target table
More specifically, transactions which perform inserts, updates, and deletes on foreign key parent tables are serialized relative to all preceding and following transactions, and not just to those transactions affecting tables involved in a constraint relationship. Conversely, transactions performing inserts, updates and deletes on foreign key child tables (referencing) are not especially serialized with regard to one another.
The MySQL MTA implementation attempts to apply independent binary log transactions in parallel. NDB
records all changes occurring in all user transactions committing in an epoch (TimeBetweenEpochs
, default 100 milliseconds), in one binary log transaction, referred to as an epoch transaction. Therefore, for two consecutive epoch transactions to be independent, and possible to apply in parallel, it is required that no row is modified in both epochs. If any single row is modified in both epochs, then they are dependent, and are applied serially, which can limit the expolitable parallelism available.
Epoch transactions are considered independent based on the set of rows modified on the source cluster in the epoch, but not including the generated mysql.ndb_apply_status
WRITE_ROW
events that convey epoch metadata. This avoids every epoch transaction being trivially dependent on the preceding epoch, but does require that the binlog is applied at the replica with the commit order preserved. This also implies that an NDB binary log with writeset dependencies is not suitable for use by a replica database using a different MySQL storage engine.
It may be possible or desirable to modify application transaction behavior to avoid patterns of repeated modifications to the same rows, in separate transactions over a short time period, to increase exploitable apply parallelism.
The amount of memory used to track binary log transaction writesets can be set using the binlog_transaction_dependency_history_size
server system variable, which defaults to 25000 row hashes.
If an average binary log transaction modifies N
rows, then to be able to identify independent (parallelizable) transactions up to a parallelism level of P
, we need binlog_transaction_dependency_history_size
to be at least
. (The maximum is 1000000.)N
* P
The finite size of the history results in a finite maximum dependency length that can be reliably determined, giving a finite parallelism that can be expressed. Any row not found in the history may be dependent on the last transaction purged from the history.
Writeset history does not act like a sliding window over the last N
transactions; rather, it is a finite buffer which is allowed to fill up completely, then its contents entirely discarded when it becomes full. This means that the history size follows a sawtooth pattern over time, and therefore the maximum detectable dependency length also follows a sawtooth pattern over time, such that independent transactions may still be marked as dependent if the writeset history buffer has been reset between their being processed.
In this scheme, each transaction in a binary log file is annotated with a sequence_number
(1, 2, 3, ...), and as well as the sequence number of the most recent binary log transaction that it depends on, to which we refer as last_committed
.
Within a given binary log file, the first transaction has sequence_number
1 and last_committed
0.
Where a binary log transaction depends on its immediate predecessor, its application is serialized. If the dependency is on an earlier transaction then it may be possible to apply the transaction in parallel with the preceding independent transactions.
The content of ANONYMOUS_GTID
events, including sequence_number
and last_committed
(and thus the transaction dependencies), can be seen using mysqlbinlog.
The ANONYMOUS_GTID
events generated on the source are handled separately from the compressed transaction payload with bulk BEGIN
, TABLE_MAP*
, WRITE_ROW*
, UPDATE_ROW*
, DELETE_ROW*
, and COMMIT
events, allowing dependencies to be determined prior to decompression. This means that the replica coordinator thread can delegate transaction payload decompression to a worker thread, providing automatic parallel decompression of independent transactions on the replica.
Secondary unique columns. Tables with secondary unique columns (that is, unique keys other than the primary key) have all columns sent to the source so that unique-key related conflicts can be detected.
Where the current binary logging mode does not include all columns, but only changed columns (--ndb-log-updated-only=OFF
, --ndb-log-update-minimal=ON
, --ndb-log-update-as-write=OFF
), this can increase the volume of data sent from data nodes to SQL nodes.
The impact depends on both the rate of modification (update or delete) of rows in such tables and the volume of data in columns which are not actually modified.
Replicating NDB to InnoDB. NDB
binary log injector transaction dependency tracking intentionally ignores the inter-transaction dependencies created by generated mysql.ndb_apply_status
metadata events, which are handled separately as part of the commit of the epoch transaction on the replica applier. For replication to InnoDB
, there is no special handling; this may result in reduced performance or other issues when using an InnoDB
multithreaded applier to consume an NDB
MTA binary log.