The persistent optimizer statistics feature improves plan stability by storing statistics to disk and making them persistent across server restarts so that the optimizer is more likely to make consistent choices each time for a given query.
Optimizer statistics are persisted to disk when innodb_stats_persistent=ON
or when individual tables are defined with STATS_PERSISTENT=1
. innodb_stats_persistent
is enabled by default.
Formerly, optimizer statistics were cleared when restarting the server and after some other types of operations, and recomputed on the next table access. Consequently, different estimates could be produced when recalculating statistics leading to different choices in query execution plans and variation in query performance.
Persistent statistics are stored in the mysql.innodb_table_stats
and mysql.innodb_index_stats
tables. See Section 17.8.10.1.5, “InnoDB Persistent Statistics Tables”.
If you prefer not to persist optimizer statistics to disk, see Section 17.8.10.2, “Configuring Non-Persistent Optimizer Statistics Parameters”
The innodb_stats_auto_recalc
variable, which is enabled by default, controls whether statistics are calculated automatically when a table undergoes changes to more than 10% of its rows. You can also configure automatic statistics recalculation for individual tables by specifying the STATS_AUTO_RECALC
clause when creating or altering a table.
Because of the asynchronous nature of automatic statistics recalculation, which occurs in the background, statistics may not be recalculated instantly after running a DML operation that affects more than 10% of a table, even when innodb_stats_auto_recalc
is enabled. Statistics recalculation can be delayed by few seconds in some cases. If up-to-date statistics are required immediately, run ANALYZE TABLE
to initiate a synchronous (foreground) recalculation of statistics.
If innodb_stats_auto_recalc
is disabled, you can ensure the accuracy of optimizer statistics by executing the ANALYZE TABLE
statement after making substantial changes to indexed columns. You might also consider adding ANALYZE TABLE
to setup scripts that you run after loading data, and running ANALYZE TABLE
on a schedule at times of low activity.
When an index is added to an existing table, or when a column is added or dropped, index statistics are calculated and added to the innodb_index_stats
table regardless of the value of innodb_stats_auto_recalc
.
innodb_stats_persistent
, innodb_stats_auto_recalc
, and innodb_stats_persistent_sample_pages
are global variables. To override these system-wide settings and configure optimizer statistics parameters for individual tables, you can define STATS_PERSISTENT
, STATS_AUTO_RECALC
, and STATS_SAMPLE_PAGES
clauses in CREATE TABLE
or ALTER TABLE
statements.
-
STATS_PERSISTENT
specifies whether to enable persistent statistics for anInnoDB
table. The valueDEFAULT
causes the persistent statistics setting for the table to be determined by theinnodb_stats_persistent
setting. A value of1
enables persistent statistics for the table, while a value of0
disables the feature. After enabling persistent statistics for an individual table, useANALYZE TABLE
to calculate statistics after table data is loaded. -
STATS_AUTO_RECALC
specifies whether to automatically recalculate persistent statistics. The valueDEFAULT
causes the persistent statistics setting for the table to be determined by theinnodb_stats_auto_recalc
setting. A value of1
causes statistics to be recalculated when 10% of table data has changed. A value0
prevents automatic recalculation for the table. When using a value of 0, useANALYZE TABLE
to recalculate statistics after making substantial changes to the table. -
STATS_SAMPLE_PAGES
specifies the number of index pages to sample when cardinality and other statistics are calculated for an indexed column, by anANALYZE TABLE
operation, for example.
All three clauses are specified in the following CREATE TABLE
example:
CREATE TABLE `t1` (
`id` int(8) NOT NULL auto_increment,
`data` varchar(255),
`date` datetime,
PRIMARY KEY (`id`),
INDEX `DATE_IX` (`date`)
) ENGINE=InnoDB,
STATS_PERSISTENT=1,
STATS_AUTO_RECALC=1,
STATS_SAMPLE_PAGES=25;
The optimizer uses estimated statistics about key distributions to choose the indexes for an execution plan, based on the relative selectivity of the index. Operations such as ANALYZE TABLE
cause InnoDB
to sample random pages from each index on a table to estimate the cardinality of the index. This sampling technique is known as a random dive.
The innodb_stats_persistent_sample_pages
controls the number of sampled pages. You can adjust the setting at runtime to manage the quality of statistics estimates used by the optimizer. The default value is 20. Consider modifying the setting when encountering the following issues:
-
Statistics are not accurate enough and the optimizer chooses suboptimal plans, as shown in
EXPLAIN
output. You can check the accuracy of statistics by comparing the actual cardinality of an index (determined by runningSELECT DISTINCT
on the index columns) with the estimates in themysql.innodb_index_stats
table.If it is determined that statistics are not accurate enough, the value of
innodb_stats_persistent_sample_pages
should be increased until the statistics estimates are sufficiently accurate. Increasinginnodb_stats_persistent_sample_pages
too much, however, could causeANALYZE TABLE
to run slowly. -
ANALYZE TABLE
is too slow. In this caseinnodb_stats_persistent_sample_pages
should be decreased untilANALYZE TABLE
execution time is acceptable. Decreasing the value too much, however, could lead to the first problem of inaccurate statistics and suboptimal query execution plans.If a balance cannot be achieved between accurate statistics and
ANALYZE TABLE
execution time, consider decreasing the number of indexed columns in the table or limiting the number of partitions to reduceANALYZE TABLE
complexity. The number of columns in the table's primary key is also important to consider, as primary key columns are appended to each nonunique index.For related information, see Section 17.8.10.3, “Estimating ANALYZE TABLE Complexity for InnoDB Tables”.
By default, InnoDB
reads uncommitted data when calculating statistics. In the case of an uncommitted transaction that deletes rows from a table, delete-marked records are excluded when calculating row estimates and index statistics, which can lead to non-optimal execution plans for other transactions that are operating on the table concurrently using a transaction isolation level other than READ UNCOMMITTED
. To avoid this scenario, innodb_stats_include_delete_marked
can be enabled to ensure that delete-marked records are included when calculating persistent optimizer statistics.
When innodb_stats_include_delete_marked
is enabled, ANALYZE TABLE
considers delete-marked records when recalculating statistics.
innodb_stats_include_delete_marked
is a global setting that affects all InnoDB
tables, and it is only applicable to persistent optimizer statistics.
The persistent statistics feature relies on the internally managed tables in the mysql
database, named innodb_table_stats
and innodb_index_stats
. These tables are set up automatically in all install, upgrade, and build-from-source procedures.
Table 17.6 Columns of innodb_table_stats
Column name | Description |
---|---|
database_name |
Database name |
table_name |
Table name, partition name, or subpartition name |
last_update |
A timestamp indicating the last time that InnoDB updated this row |
n_rows |
The number of rows in the table |
clustered_index_size |
The size of the primary index, in pages |
sum_of_other_index_sizes |
The total size of other (non-primary) indexes, in pages |
Table 17.7 Columns of innodb_index_stats
Column name | Description |
---|---|
database_name |
Database name |
table_name |
Table name, partition name, or subpartition name |
index_name |
Index name |
last_update |
A timestamp indicating the last time the row was updated |
stat_name |
The name of the statistic, whose value is reported in the stat_value column |
stat_value |
The value of the statistic that is named in stat_name column |
sample_size |
The number of pages sampled for the estimate provided in the stat_value column |
stat_description |
Description of the statistic that is named in the stat_name column |
The innodb_table_stats
and innodb_index_stats
tables include a last_update
column that shows when index statistics were last updated:
mysql> SELECT * FROM innodb_table_stats \G
*************************** 1. row ***************************
database_name: sakila
table_name: actor
last_update: 2014-05-28 16:16:44
n_rows: 200
clustered_index_size: 1
sum_of_other_index_sizes: 1
...
mysql> SELECT * FROM innodb_index_stats \G
*************************** 1. row ***************************
database_name: sakila
table_name: actor
index_name: PRIMARY
last_update: 2014-05-28 16:16:44
stat_name: n_diff_pfx01
stat_value: 200
sample_size: 1
...
The innodb_table_stats
and innodb_index_stats
tables can be updated manually, which makes it possible to force a specific query optimization plan or test alternative plans without modifying the database. If you manually update statistics, use the FLUSH TABLE
statement to load the updated statistics.tbl_name
Persistent statistics are considered local information, because they relate to the server instance. The innodb_table_stats
and innodb_index_stats
tables are therefore not replicated when automatic statistics recalculation takes place. If you run ANALYZE TABLE
to initiate a synchronous recalculation of statistics, the statement is replicated (unless you suppressed logging for it), and recalculation takes place on replicas.
The innodb_table_stats
table contains one row for each table. The following example demonstrates the type of data collected.
Table t1
contains a primary index (columns a
, b
) secondary index (columns c
, d
), and unique index (columns e
, f
):
CREATE TABLE t1 (
a INT, b INT, c INT, d INT, e INT, f INT,
PRIMARY KEY (a, b), KEY i1 (c, d), UNIQUE KEY i2uniq (e, f)
) ENGINE=INNODB;
After inserting five rows of sample data, table t1
appears as follows:
mysql> SELECT * FROM t1;
+---+---+------+------+------+------+
| a | b | c | d | e | f |
+---+---+------+------+------+------+
| 1 | 1 | 10 | 11 | 100 | 101 |
| 1 | 2 | 10 | 11 | 200 | 102 |
| 1 | 3 | 10 | 11 | 100 | 103 |
| 1 | 4 | 10 | 12 | 200 | 104 |
| 1 | 5 | 10 | 12 | 100 | 105 |
+---+---+------+------+------+------+
To immediately update statistics, run ANALYZE TABLE
(if innodb_stats_auto_recalc
is enabled, statistics are updated automatically within a few seconds assuming that the 10% threshold for changed table rows is reached):
mysql> ANALYZE TABLE t1;
+---------+---------+----------+----------+
| Table | Op | Msg_type | Msg_text |
+---------+---------+----------+----------+
| test.t1 | analyze | status | OK |
+---------+---------+----------+----------+
Table statistics for table t1
show the last time InnoDB
updated the table statistics (2014-03-14 14:36:34
), the number of rows in the table (5
), the clustered index size (1
page), and the combined size of the other indexes (2
pages).
mysql> SELECT * FROM mysql.innodb_table_stats WHERE table_name like 't1'\G
*************************** 1. row ***************************
database_name: test
table_name: t1
last_update: 2014-03-14 14:36:34
n_rows: 5
clustered_index_size: 1
sum_of_other_index_sizes: 2
The innodb_index_stats
table contains multiple rows for each index. Each row in the innodb_index_stats
table provides data related to a particular index statistic which is named in the stat_name
column and described in the stat_description
column. For example:
mysql> SELECT index_name, stat_name, stat_value, stat_description
FROM mysql.innodb_index_stats WHERE table_name like 't1';
+------------+--------------+------------+-----------------------------------+
| index_name | stat_name | stat_value | stat_description |
+------------+--------------+------------+-----------------------------------+
| PRIMARY | n_diff_pfx01 | 1 | a |
| PRIMARY | n_diff_pfx02 | 5 | a,b |
| PRIMARY | n_leaf_pages | 1 | Number of leaf pages in the index |
| PRIMARY | size | 1 | Number of pages in the index |
| i1 | n_diff_pfx01 | 1 | c |
| i1 | n_diff_pfx02 | 2 | c,d |
| i1 | n_diff_pfx03 | 2 | c,d,a |
| i1 | n_diff_pfx04 | 5 | c,d,a,b |
| i1 | n_leaf_pages | 1 | Number of leaf pages in the index |
| i1 | size | 1 | Number of pages in the index |
| i2uniq | n_diff_pfx01 | 2 | e |
| i2uniq | n_diff_pfx02 | 5 | e,f |
| i2uniq | n_leaf_pages | 1 | Number of leaf pages in the index |
| i2uniq | size | 1 | Number of pages in the index |
+------------+--------------+------------+-----------------------------------+
The stat_name
column shows the following types of statistics:
-
size
: Wherestat_name
=size
, thestat_value
column displays the total number of pages in the index. -
n_leaf_pages
: Wherestat_name
=n_leaf_pages
, thestat_value
column displays the number of leaf pages in the index. -
n_diff_pfx
: WhereNN
stat_name
=n_diff_pfx01
, thestat_value
column displays the number of distinct values in the first column of the index. Wherestat_name
=n_diff_pfx02
, thestat_value
column displays the number of distinct values in the first two columns of the index, and so on. Wherestat_name
=n_diff_pfx
, theNN
stat_description
column shows a comma separated list of the index columns that are counted.
To further illustrate the n_diff_pfx
statistic, which provides cardinality data, consider once again the NN
t1
table example that was introduced previously. As shown below, the t1
table is created with a primary index (columns a
, b
), a secondary index (columns c
, d
), and a unique index (columns e
, f
):
CREATE TABLE t1 (
a INT, b INT, c INT, d INT, e INT, f INT,
PRIMARY KEY (a, b), KEY i1 (c, d), UNIQUE KEY i2uniq (e, f)
) ENGINE=INNODB;
After inserting five rows of sample data, table t1
appears as follows:
mysql> SELECT * FROM t1;
+---+---+------+------+------+------+
| a | b | c | d | e | f |
+---+---+------+------+------+------+
| 1 | 1 | 10 | 11 | 100 | 101 |
| 1 | 2 | 10 | 11 | 200 | 102 |
| 1 | 3 | 10 | 11 | 100 | 103 |
| 1 | 4 | 10 | 12 | 200 | 104 |
| 1 | 5 | 10 | 12 | 100 | 105 |
+---+---+------+------+------+------+
When you query the index_name
, stat_name
, stat_value
, and stat_description
, where stat_name LIKE 'n_diff%'
, the following result set is returned:
mysql> SELECT index_name, stat_name, stat_value, stat_description
FROM mysql.innodb_index_stats
WHERE table_name like 't1' AND stat_name LIKE 'n_diff%';
+------------+--------------+------------+------------------+
| index_name | stat_name | stat_value | stat_description |
+------------+--------------+------------+------------------+
| PRIMARY | n_diff_pfx01 | 1 | a |
| PRIMARY | n_diff_pfx02 | 5 | a,b |
| i1 | n_diff_pfx01 | 1 | c |
| i1 | n_diff_pfx02 | 2 | c,d |
| i1 | n_diff_pfx03 | 2 | c,d,a |
| i1 | n_diff_pfx04 | 5 | c,d,a,b |
| i2uniq | n_diff_pfx01 | 2 | e |
| i2uniq | n_diff_pfx02 | 5 | e,f |
+------------+--------------+------------+------------------+
For the PRIMARY
index, there are two n_diff%
rows. The number of rows is equal to the number of columns in the index.
For nonunique indexes, InnoDB
appends the columns of the primary key.
-
Where
index_name
=PRIMARY
andstat_name
=n_diff_pfx01
, thestat_value
is1
, which indicates that there is a single distinct value in the first column of the index (columna
). The number of distinct values in columna
is confirmed by viewing the data in columna
in tablet1
, in which there is a single distinct value (1
). The counted column (a
) is shown in thestat_description
column of the result set. -
Where
index_name
=PRIMARY
andstat_name
=n_diff_pfx02
, thestat_value
is5
, which indicates that there are five distinct values in the two columns of the index (a,b
). The number of distinct values in columnsa
andb
is confirmed by viewing the data in columnsa
andb
in tablet1
, in which there are five distinct values: (1,1
), (1,2
), (1,3
), (1,4
) and (1,5
). The counted columns (a,b
) are shown in thestat_description
column of the result set.
For the secondary index (i1
), there are four n_diff%
rows. Only two columns are defined for the secondary index (c,d
) but there are four n_diff%
rows for the secondary index because InnoDB
suffixes all nonunique indexes with the primary key. As a result, there are four n_diff%
rows instead of two to account for the both the secondary index columns (c,d
) and the primary key columns (a,b
).
-
Where
index_name
=i1
andstat_name
=n_diff_pfx01
, thestat_value
is1
, which indicates that there is a single distinct value in the first column of the index (columnc
). The number of distinct values in columnc
is confirmed by viewing the data in columnc
in tablet1
, in which there is a single distinct value: (10
). The counted column (c
) is shown in thestat_description
column of the result set. -
Where
index_name
=i1
andstat_name
=n_diff_pfx02
, thestat_value
is2
, which indicates that there are two distinct values in the first two columns of the index (c,d
). The number of distinct values in columnsc
and
is confirmed by viewing the data in columnsc
andd
in tablet1
, in which there are two distinct values: (10,11
) and (10,12
). The counted columns (c,d
) are shown in thestat_description
column of the result set. -
Where
index_name
=i1
andstat_name
=n_diff_pfx03
, thestat_value
is2
, which indicates that there are two distinct values in the first three columns of the index (c,d,a
). The number of distinct values in columnsc
,d
, anda
is confirmed by viewing the data in columnc
,d
, anda
in tablet1
, in which there are two distinct values: (10,11,1
) and (10,12,1
). The counted columns (c,d,a
) are shown in thestat_description
column of the result set. -
Where
index_name
=i1
andstat_name
=n_diff_pfx04
, thestat_value
is5
, which indicates that there are five distinct values in the four columns of the index (c,d,a,b
). The number of distinct values in columnsc
,d
,a
andb
is confirmed by viewing the data in columnsc
,d
,a
, andb
in tablet1
, in which there are five distinct values: (10,11,1,1
), (10,11,1,2
), (10,11,1,3
), (10,12,1,4
), and (10,12,1,5
). The counted columns (c,d,a,b
) are shown in thestat_description
column of the result set.
For the unique index (i2uniq
), there are two n_diff%
rows.
-
Where
index_name
=i2uniq
andstat_name
=n_diff_pfx01
, thestat_value
is2
, which indicates that there are two distinct values in the first column of the index (columne
). The number of distinct values in columne
is confirmed by viewing the data in columne
in tablet1
, in which there are two distinct values: (100
) and (200
). The counted column (e
) is shown in thestat_description
column of the result set. -
Where
index_name
=i2uniq
andstat_name
=n_diff_pfx02
, thestat_value
is5
, which indicates that there are five distinct values in the two columns of the index (e,f
). The number of distinct values in columnse
andf
is confirmed by viewing the data in columnse
andf
in tablet1
, in which there are five distinct values: (100,101
), (200,102
), (100,103
), (200,104
), and (100,105
). The counted columns (e,f
) are shown in thestat_description
column of the result set.
You can retrieve the index size for tables, partitions, or subpartitions can using the innodb_index_stats
table. In the following example, index sizes are retrieved for table t1
. For a definition of table t1
and corresponding index statistics, see Section 17.8.10.1.6, “InnoDB Persistent Statistics Tables Example”.
mysql> SELECT SUM(stat_value) pages, index_name,
SUM(stat_value)*@@innodb_page_size size
FROM mysql.innodb_index_stats WHERE table_name='t1'
AND stat_name = 'size' GROUP BY index_name;
+-------+------------+-------+
| pages | index_name | size |
+-------+------------+-------+
| 1 | PRIMARY | 16384 |
| 1 | i1 | 16384 |
| 1 | i2uniq | 16384 |
+-------+------------+-------+
For partitions or subpartitions, you can use the same query with a modified WHERE
clause to retrieve index sizes. For example, the following query retrieves index sizes for partitions of table t1
:
mysql> SELECT SUM(stat_value) pages, index_name,
SUM(stat_value)*@@innodb_page_size size
FROM mysql.innodb_index_stats WHERE table_name like 't1#P%'
AND stat_name = 'size' GROUP BY index_name;