MySQL has many operators and functions that return a string. This section answers the question: What is the character set and collation of such a string?
For simple functions that take string input and return a string result as output, the output's character set and collation are the same as those of the principal input value. For example, UPPER(
returns a string with the same character string and collation as X
)X
. The same applies for INSTR()
, LCASE()
, LOWER()
, LTRIM()
, MID()
, REPEAT()
, REPLACE()
, REVERSE()
, RIGHT()
, RPAD()
, RTRIM()
, SOUNDEX()
, SUBSTRING()
, TRIM()
, UCASE()
, and UPPER()
.
The REPLACE()
function, unlike all other functions, always ignores the collation of the string input and performs a case-sensitive comparison.
If a string input or function result is a binary string, the string has the binary
character set and collation. This can be checked by using the CHARSET()
and COLLATION()
functions, both of which return binary
for a binary string argument:
mysql> SELECT CHARSET(BINARY 'a'), COLLATION(BINARY 'a');
+---------------------+-----------------------+
| CHARSET(BINARY 'a') | COLLATION(BINARY 'a') |
+---------------------+-----------------------+
| binary | binary |
+---------------------+-----------------------+
For operations that combine multiple string inputs and return a single string output, the “aggregation rules” of standard SQL apply for determining the collation of the result:
-
If an explicit
COLLATE
occurs, useY
Y
. -
If explicit
COLLATE
andY
COLLATE
occur, raise an error.Z
-
Otherwise, if all collations are
Y
, useY
. -
Otherwise, the result has no collation.
For example, with CASE ... WHEN a THEN b WHEN b THEN c COLLATE
, the resulting collation is X
ENDX
. The same applies for UNION
, ||
, CONCAT()
, ELT()
, GREATEST()
, IF()
, and LEAST()
.
For operations that convert to character data, the character set and collation of the strings that result from the operations are defined by the character_set_connection
and collation_connection
system variables that determine the default connection character set and collation (see Section 12.4, “Connection Character Sets and Collations”). This applies only to BIN_TO_UUID()
, CAST()
, CONV()
, FORMAT()
, HEX()
, and SPACE()
.
An exception to the preceding principle occurs for expressions for virtual generated columns. In such expressions, the table character set is used for BIN_TO_UUID()
, CONV()
, or HEX()
results, regardless of connection character set.
If there is any question about the character set or collation of the result returned by a string function, use the CHARSET()
or COLLATION()
function to find out:
mysql> SELECT USER(), CHARSET(USER()), COLLATION(USER());
+----------------+-----------------+--------------------+
| USER() | CHARSET(USER()) | COLLATION(USER()) |
+----------------+-----------------+--------------------+
| test@localhost | utf8mb3 | utf8mb3_general_ci |
+----------------+-----------------+--------------------+
mysql> SELECT CHARSET(COMPRESS('abc')), COLLATION(COMPRESS('abc'));
+--------------------------+----------------------------+
| CHARSET(COMPRESS('abc')) | COLLATION(COMPRESS('abc')) |
+--------------------------+----------------------------+
| binary | binary |
+--------------------------+----------------------------+