Documentation Home
MySQL 8.4 Reference Manual
Related Documentation Download this Manual
PDF (US Ltr) - 39.8Mb
PDF (A4) - 39.9Mb
Man Pages (TGZ) - 257.9Kb
Man Pages (Zip) - 364.9Kb
Info (Gzip) - 4.0Mb
Info (Zip) - 4.0Mb


MySQL 8.4 Reference Manual  /  Character Sets, Collations, Unicode  /  Character Sets and Collations in MySQL

12.2 MySQL 中的字符集和排序规则

MySQL Server 支持多种字符集,包括几个 Unicode 字符集。要显示可用的字符集,可以使用 INFORMATION_SCHEMA CHARACTER_SETS 表或SHOW CHARACTER SET 语句。以下是一部分列表。更多信息,请见 第12.10节,“支持的字符集和排序规则”.

Press CTRL+C to copy
mysql> SHOW CHARACTER SET; +----------+---------------------------------+---------------------+--------+ | Charset | Description | Default collation | Maxlen | +----------+---------------------------------+---------------------+--------+ | big5 | Big5 Traditional Chinese | big5_chinese_ci | 2 | | binary | Binary pseudo charset | binary | 1 | ... | latin1 | cp1252 West European | latin1_swedish_ci | 1 | ... | ucs2 | UCS-2 Unicode | ucs2_general_ci | 2 | ... | utf8mb3 | UTF-8 Unicode | utf8mb3_general_ci | 3 | | utf8mb4 | UTF-8 Unicode | utf8mb4_0900_ai_ci | 4 | ...

默认情况下,SHOW CHARACTER SET 语句显示所有可用的字符集。它可以带有可选的LIKEWHERE 子句,指示要匹配的字符集名称。以下示例显示一些 Unicode 字符集(基于 Unicode 转换格式):

Press CTRL+C to copy
mysql> SHOW CHARACTER SET LIKE 'utf%'; +---------+------------------+--------------------+--------+ | Charset | Description | Default collation | Maxlen | +---------+------------------+--------------------+--------+ | utf16 | UTF-16 Unicode | utf16_general_ci | 4 | | utf16le | UTF-16LE Unicode | utf16le_general_ci | 4 | | utf32 | UTF-32 Unicode | utf32_general_ci | 4 | | utf8mb3 | UTF-8 Unicode | utf8mb3_general_ci | 3 | | utf8mb4 | UTF-8 Unicode | utf8mb4_0900_ai_ci | 4 | +---------+------------------+--------------------+--------+

给定的字符集总是至少有一个排序规则,多数字符集都有几个。要列出字符集的显示排序规则,可以使用 INFORMATION_SCHEMA COLLATIONS 表或SHOW COLLATION 语句。

默认情况下,SHOW COLLATION 语句显示所有可用的排序规则。它可以带一个可选的LIKEWHERE 子句,指示要显示哪些排序规则名称。例如,要查看默认字符集utf8mb4的排序规则,可以使用以下语句:

Press CTRL+C to copy
mysql> SHOW COLLATION WHERE Charset = 'utf8mb4'; +----------------------------+---------+-----+---------+----------+---------+---------------+ | Collation | Charset | Id | Default | Compiled | Sortlen | Pad_attribute | +----------------------------+---------+-----+---------+----------+---------+---------------+ | utf8mb4_0900_ai_ci | utf8mb4 | 255 | Yes | Yes | 0 | NO PAD | | utf8mb4_0900_as_ci | utf8mb4 | 305 | | Yes | 0 | NO PAD | | utf8mb4_0900_as_cs | utf8mb4 | 278 | | Yes | 0 | NO PAD | | utf8mb4_0900_bin | utf8mb4 | 309 | | Yes | 1 | NO PAD | | utf8mb4_bin | utf8mb4 | 46 | | Yes | 1 | PAD SPACE | | utf8mb4_croatian_ci | utf8mb4 | 245 | | Yes | 8 | PAD SPACE | | utf8mb4_cs_0900_ai_ci | utf8mb4 | 266 | | Yes | 0 | NO PAD | | utf8mb4_cs_0900_as_cs | utf8mb4 | 289 | | Yes | 0 | NO PAD | | utf8mb4_czech_ci | utf8mb4 | 234 | | Yes | 8 | PAD SPACE | | utf8mb4_danish_ci | utf8mb4 | 235 | | Yes | 8 | PAD SPACE | | utf8mb4_da_0900_ai_ci | utf8mb4 | 267 | | Yes | 0 | NO PAD | | utf8mb4_da_0900_as_cs | utf8mb4 | 290 | | Yes | 0 | NO PAD | | utf8mb4_de_pb_0900_ai_ci | utf8mb4 | 256 | | Yes | 0 | NO PAD | | utf8mb4_de_pb_0900_as_cs | utf8mb4 | 279 | | Yes | 0 | NO PAD | | utf8mb4_eo_0900_ai_ci | utf8mb4 | 273 | | Yes | 0 | NO PAD | | utf8mb4_eo_0900_as_cs | utf8mb4 | 296 | | Yes | 0 | NO PAD | | utf8mb4_esperanto_ci | utf8mb4 | 241 | | Yes | 8 | PAD SPACE | | utf8mb4_estonian_ci | utf8mb4 | 230 | | Yes | 8 | PAD SPACE | | utf8mb4_es_0900_ai_ci | utf8mb4 | 263 | | Yes | 0 | NO PAD | | utf8mb4_es_0900_as_cs | utf8mb4 | 286 | | Yes | 0 | NO PAD | | utf8mb4_es_trad_0900_ai_ci | utf8mb4 | 270 | | Yes | 0 | NO PAD | | utf8mb4_es_trad_0900_as_cs | utf8mb4 | 293 | | Yes | 0 | NO PAD | | utf8mb4_et_0900_ai_ci | utf8mb4 | 262 | | Yes | 0 | NO PAD | | utf8mb4_et_0900_as_cs | utf8mb4 | 285 | | Yes | 0 | NO PAD | | utf8mb4_general_ci | utf8mb4 | 45 | | Yes | 1 | PAD SPACE | | utf8mb4_german2_ci | utf8mb4 | 244 | | Yes | 8 | PAD SPACE | | utf8mb4_hr_0900_ai_ci | utf8mb4 | 275 | | Yes | 0 | NO PAD | | utf8mb4_hr_0900_as_cs | utf8mb4 | 298 | | Yes | 0 | NO PAD | | utf8mb4_hungarian_ci | utf8mb4 | 242 | | Yes | 8 | PAD SPACE | | utf8mb4_hu_0900_ai_ci | utf8mb4 | 274 | | Yes | 0 | NO PAD | | utf8mb4_hu_0900_as_cs | utf8mb4 | 297 | | Yes | 0 | NO PAD | | utf8mb4_icelandic_ci | utf8mb4 | 225 | | Yes | 8 | PAD SPACE | | utf8mb4_is_0900_ai_ci | utf8mb4 | 257 | | Yes | 0 | NO PAD | | utf8mb4_is_0900_as_cs | utf8mb4 | 280 | | Yes | 0 | NO PAD | | utf8mb4_ja_0900_as_cs | utf8mb4 | 303 | | Yes | 0 | NO PAD | | utf8mb4_ja_0900_as_cs_ks | utf8mb4 | 304 | | Yes | 24 | NO PAD | | utf8mb4_latvian_ci | utf8mb4 | 226 | | Yes | 8 | PAD SPACE | | utf8mb4_la_0900_ai_ci | utf8mb4 | 271 | | Yes | 0 | NO PAD | | utf8mb4_la_0900_as_cs | utf8mb4 | 294 | | Yes | 0 | NO PAD | | utf8mb4_lithuanian_ci | utf8mb4 | 236 | | Yes | 8 | PAD SPACE | | utf8mb4_lt_0900_ai_ci | utf8mb4 | 268 | | Yes | 0 | NO PAD | | utf8mb4_lt_0900_as_cs | utf8mb4 | 291 | | Yes | 0 | NO PAD | | utf8mb4_lv_0900_ai_ci | utf8mb4 | 258 | | Yes | 0 | NO PAD | | utf8mb4_lv_0900_as_cs | utf8mb4 | 281 | | Yes | 0 | NO PAD | | utf8mb4_persian_ci | utf8mb4 | 240 | | Yes | 8 | PAD SPACE | | utf8mb4_pl_0900_ai_ci | utf8mb4 | 261 | | Yes | 0 | NO PAD | | utf8mb4_pl_0900_as_cs | utf8mb4 | 284 | | Yes | 0 | NO PAD | | utf8mb4_polish_ci | utf8mb4 | 229 | | Yes | 8 | PAD SPACE | | utf8mb4_romanian_ci | utf8mb4 | 227 | | Yes | 8 | PAD SPACE | | utf8mb4_roman_ci | utf8mb4 | 239 | | Yes | 8 | PAD SPACE | | utf8mb4_ro_0900_ai_ci | utf8mb4 | 259 | | Yes | 0 | NO PAD | | utf8mb4_ro_0900_as_cs | utf8mb4 | 282 | | Yes | 0 | NO PAD | | utf8mb4_ru_0900_ai_ci | utf8mb4 | 306 | | Yes | 0 | NO PAD | | utf8mb4_ru_0900_as_cs | utf8mb4 | 307 | | Yes | 0 | NO PAD | | utf8mb4_sinhala_ci | utf8mb4 | 243 | | Yes | 8 | PAD SPACE | | utf8mb4_sk_0900_ai_ci | utf8mb4 | 269 | | Yes | 0 | NO PAD | | utf8mb4_sk_0900_as_cs | utf8mb4 | 292 | | Yes | 0 | NO PAD | | utf8mb4_slovak_ci | utf8mb4 | 237 | | Yes | 8 | PAD SPACE | | utf8mb4_slovenian_ci | utf8mb4 | 228 | | Yes | 8 | PAD SPACE | | utf8mb4_sl_0900_ai_ci | utf8mb4 | 260 | | Yes | 0 | NO PAD | | utf8mb4_sl_0900_as_cs | utf8mb4 | 283 | | Yes | 0 | NO PAD | | utf8mb4_spanish2_ci | utf8mb4 | 238 | | Yes | 8 | PAD SPACE | | utf8mb4_spanish_ci | utf8mb4 | 231 | | Yes | 8 | PAD SPACE | | utf8mb4_sv_0900_ai_ci | utf8mb4 | 264 | | Yes | 0 | NO PAD | | utf8mb4_sv_0900_as_cs | utf8mb4 | 287 | | Yes | 0 | NO PAD | | utf8mb4_swedish_ci | utf8mb4 | 232 | | Yes | 8 | PAD SPACE | | utf8mb4_tr_0900_ai_ci | utf8mb4 | 265 | | Yes | 0 | NO PAD | | utf8mb4_tr_0900_as_cs | utf8mb4 | 288 | | Yes | 0 | NO PAD | | utf8mb4_turkish_ci | utf8mb4 | 233 | | Yes | 8 | PAD SPACE | | utf8mb4_unicode_520_ci | utf8mb4 | 246 | | Yes | 8 | PAD SPACE | | utf8mb4_unicode_ci | utf8mb4 | 224 | | Yes | 8 | PAD SPACE | | utf8mb4_vietnamese_ci | utf8mb4 | 247 | | Yes | 8 | PAD SPACE | | utf8mb4_vi_0900_ai_ci | utf8mb4 | 277 | | Yes | 0 | NO PAD | | utf8mb4_vi_0900_as_cs | utf8mb4 | 300 | | Yes | 0 | NO PAD | | utf8mb4_zh_0900_as_cs | utf8mb4 | 308 | | Yes | 0 | NO PAD | +----------------------------+---------+-----+---------+----------+---------+---------------+

关于这些排序规则的更多信息,请参见第12.10.1节,“Unicode 字符集”

排序规则具有这些一般特征:

  • 两个不同的字符集不能拥有相同的排序规则。

  • 每个字符集都有一个默认排序规则。例如,utf8mb4latin1 的默认排序规则分别是 utf8mb4_0900_ai_cilatin1_swedish_ciINFORMATION_SCHEMACHARACTER_SETSSHOW CHARACTER SET 语句指示每个字符集的默认排序规则。INFORMATION_SCHEMACOLLATIONSSHOW COLLATION 语句有一个列,指示每个排序规则是否是其字符集的默认排序规则(如果是,则为 Yes,否则为空)。

  • 排序规则名称以与之关联的字符集名称开头,通常后跟一个或多个表示其他排序特征的后缀。关于命名约定,请参见第12.3.1节,“排序规则命名约定”

当字符集有多个排序规则时,可能不清楚哪个排序规则最适合给定的应用程序。为了避免选择不合适的排序规则,请使用代表数据值进行一些比较,以确保给定的排序规则按预期方式排序值。