The utf8mb4
character set has these characteristics:
-
Supports BMP and supplementary characters.
-
Requires a maximum of four bytes per multibyte character.
utf8mb4
contrasts with the utf8mb3
character set, which supports only BMP characters and uses a maximum of three bytes per character:
-
For a BMP character,
utf8mb4
andutf8mb3
have identical storage characteristics: same code values, same encoding, same length. -
For a supplementary character,
utf8mb4
requires four bytes to store it, whereasutf8mb3
cannot store the character at all. When convertingutf8mb3
columns toutf8mb4
, you need not worry about converting supplementary characters because there are none.
utf8mb4
is a superset of utf8mb3
, so for an operation such as the following concatenation, the result has character set utf8mb4
and the collation of utf8mb4_col
:
SELECT CONCAT(utf8mb3_col, utf8mb4_col);
Similarly, the following comparison in the WHERE
clause works according to the collation of utf8mb4_col
:
SELECT * FROM utf8mb3_tbl, utf8mb4_tbl
WHERE utf8mb3_tbl.utf8mb3_col = utf8mb4_tbl.utf8mb4_col;
For information about data type storage as it relates to multibyte character sets, see String Type Storage Requirements.