JeeStudy 发表于 2020-6-16 21:45:40

深入理解MYSQL 字符集(Charset)和排序规则(Collation)

    字符集(character set)是一组符号和编码。排序规则(collation)是一组用于比较字符集中的字符的规则。让我们通过一个假想字符集的例子来清楚地区分。


    假设我们有一四个字母:A,B,a,b。我们给每个字母一个数字:A = 0,B = 1,a = 2,b =3。字母A是一个符号,数字0是A的编码,所有四个字母及其编码的组合为字符集。
    假设我们要比较两个字符串值A和B。最简单的方法是查看编码:A表示0,B表示1。因为0小于1,所以我们说A小于B。
    我们刚才所做的是将排序规则应用于字符集。排序规则是一组规则(在这种情况下,只有一个规则):“比较编码”。
    我们将所有可能的排序规则中最简单的一种称为二进制排序规则。
    但是,如果我们想说小写字母和大写字母是等效的呢?
    那么我们将至少有两个规则:
    (1)将小写字母a和b视为与A和B等效;
    (2)然后比较编码。
    我们称此为不区分大小写的排序规则。它比二进制排序规则要复杂一些。
    在现实生活中,大多数字符集都有许多字符:不仅是A和B,而且还有所有字母,有时甚至是多个字母或带有数千个字符的东部书写系统,以及许多特殊符号和标点符号。同样在现实生活中,大多数排序规则都有很多规则,不仅用于区分字母大小写,而且还用于区分是否区分重音(“重音”是附加在字符上的标记,如德语Ö),以及用于多字符映射(例如在两个德国归类之一中Ö= OE的规则)。

MySQL可以为您做这些事情:
    使用各种字符集存储字符串。
    使用各种排序规则比较字符串。
    在同一服务器,同一数据库甚至同一表中混合使用具有不同字符集或排序规则的字符串。

    在任何级别启用字符集和排序规则的规范。


若要有效使用这些功能,您必须知道可用的字符集和排序规则,如何更改默认值以及它们如何影响字符串运算符和函数的行为。




MySQL为您提供了SHOW CHARACTER SET语句或者show charset;,该语句可让您获取字符集的默认排序规则,如下所示:

CharsetDescriptionDefaultcollationMaxlen
big5Big5 TraditionalChinesebig5_chinese_ci2
8-DecDEC West Europeandec8_swedish_ci1
cp850DOS West Europeancp850_general_ci1
hp8HP West Europeanhp8_english_ci1
koi8rKOI8-R Relcom Russiankoi8r_general_ci1
latin1cp1252 West Europeanlatin1_swedish_ci1
latin2ISO 8859-2 CentralEuropeanlatin2_general_ci1
swe77bit Swedishswe7_swedish_ci1
asciiUS ASCIIascii_general_ci1
ujisEUC-JP Japaneseujis_japanese_ci3
sjisShift-JIS Japanesesjis_japanese_ci2
hebrewISO 8859-8 Hebrewhebrew_general_ci1
tis620TIS620 Thaitis620_thai_ci1
euckrEUC-KR Koreaneuckr_korean_ci2
koi8uKOI8-U Ukrainiankoi8u_general_ci1
gb2312GB2312 SimplifiedChinesegb2312_chinese_ci2
greekISO 8859-7 Greekgreek_general_ci1
cp1250Windows CentralEuropeancp1250_general_ci1
gbkGBK SimplifiedChinesegbk_chinese_ci2
latin5ISO 8859-9 Turkishlatin5_turkish_ci1
armscii8ARMSCII-8 Armenianarmscii8_general_ci1
utf8UTF-8 Unicodeutf8_general_ci3
ucs2UCS-2 Unicodeucs2_general_ci2
cp866DOS Russiancp866_general_ci1
keybcs2DOS KamenickyCzech-Slovakkeybcs2_general_ci1
macceMac Central Europeanmacce_general_ci1
macromanMac West Europeanmacroman_general_ci1
cp852DOS Central Europeancp852_general_ci1
latin7ISO 8859-13 Balticlatin7_general_ci1
utf8mb4UTF-8 Unicodeutf8mb4_general_ci4
cp1251Windows Cyrilliccp1251_general_ci1
utf16UTF-16 Unicodeutf16_general_ci4
utf16leUTF-16LE Unicodeutf16le_general_ci4
cp1256Windows Arabiccp1256_general_ci1
cp1257Windows Balticcp1257_general_ci1
utf32UTF-32 Unicodeutf32_general_ci4
binaryBinary pseudo charsetbinary1
geostd8GEOSTD8 Georgiangeostd8_general_ci1
cp932SJIS for WindowsJapanesecp932_japanese_ci2
eucjpmsUJIS for WindowsJapaneseeucjpms_japanese_ci3
gb18030China NationalStandard GB18030gb18030_chinese_ci4
按照约定,字符集的排序规则以字符集名称开头,并以_ci(不区分大小写)_cs(区分大小写)或_bin(二进制)结束。

要获取给定字符集的所有排序规则,请使用SHOW COLLATION语句,如下所示:
CollationCharsetIdDefaultCompiledSortlen
big5_chinese_cibig51YesYes1
big5_binbig584
Yes1
dec8_swedish_ci8-Dec3YesYes1
dec8_bin8-Dec69
Yes1
cp850_general_cicp8504YesYes1
cp850_bincp85080
Yes1
hp8_english_cihp86YesYes1
hp8_binhp872
Yes1
koi8r_general_cikoi8r7YesYes1
koi8r_binkoi8r74
Yes1
latin1_german1_cilatin15
Yes1
latin1_swedish_cilatin18YesYes1
latin1_danish_cilatin115
Yes1
latin1_german2_cilatin131
Yes2
latin1_binlatin147
Yes1
latin1_general_cilatin148
Yes1
latin1_general_cslatin149
Yes1
latin1_spanish_cilatin194
Yes1
latin2_czech_cslatin22
Yes4
latin2_general_cilatin29YesYes1
latin2_hungarian_cilatin221
Yes1
latin2_croatian_cilatin227
Yes1
latin2_binlatin277
Yes1
swe7_swedish_ciswe710YesYes1
swe7_binswe782
Yes1
ascii_general_ciascii11YesYes1
ascii_binascii65
Yes1
ujis_japanese_ciujis12YesYes1
ujis_binujis91
Yes1
sjis_japanese_cisjis13YesYes1
sjis_binsjis88
Yes1
hebrew_general_cihebrew16YesYes1
hebrew_binhebrew71
Yes1
tis620_thai_citis62018YesYes4
tis620_bintis62089
Yes1
euckr_korean_cieuckr19YesYes1
euckr_bineuckr85
Yes1
koi8u_general_cikoi8u22YesYes1
koi8u_binkoi8u75
Yes1
gb2312_chinese_cigb231224YesYes1
gb2312_bingb231286
Yes1
greek_general_cigreek25YesYes1
greek_bingreek70
Yes1
cp1250_general_cicp125026YesYes1
cp1250_czech_cscp125034
Yes2
cp1250_croatian_cicp125044
Yes1
cp1250_bincp125066
Yes1
cp1250_polish_cicp125099
Yes1
gbk_chinese_cigbk28YesYes1
gbk_bingbk87
Yes1
latin5_turkish_cilatin530YesYes1
latin5_binlatin578
Yes1
armscii8_general_ciarmscii832YesYes1
armscii8_binarmscii864
Yes1
utf8_general_ciutf833YesYes1
utf8_binutf883
Yes1
utf8_unicode_ciutf8192
Yes8
utf8_icelandic_ciutf8193
Yes8
utf8_latvian_ciutf8194
Yes8
utf8_romanian_ciutf8195
Yes8
utf8_slovenian_ciutf8196
Yes8
utf8_polish_ciutf8197
Yes8
utf8_estonian_ciutf8198
Yes8
utf8_spanish_ciutf8199
Yes8
utf8_swedish_ciutf8200
Yes8
utf8_turkish_ciutf8201
Yes8
utf8_czech_ciutf8202
Yes8
utf8_danish_ciutf8203
Yes8
utf8_lithuanian_ciutf8204
Yes8
utf8_slovak_ciutf8205
Yes8
utf8_spanish2_ciutf8206
Yes8
utf8_roman_ciutf8207
Yes8
utf8_persian_ciutf8208
Yes8
utf8_esperanto_ciutf8209
Yes8
utf8_hungarian_ciutf8210
Yes8
utf8_sinhala_ciutf8211
Yes8
utf8_german2_ciutf8212
Yes8
utf8_croatian_ciutf8213
Yes8
utf8_unicode_520_ciutf8214
Yes8
utf8_vietnamese_ciutf8215
Yes8
utf8_general_mysql500_ciutf8223
Yes1
ucs2_general_ciucs235YesYes1
ucs2_binucs290
Yes1
ucs2_unicode_ciucs2128
Yes8
ucs2_icelandic_ciucs2129
Yes8
ucs2_latvian_ciucs2130
Yes8
ucs2_romanian_ciucs2131
Yes8
ucs2_slovenian_ciucs2132
Yes8
ucs2_polish_ciucs2133
Yes8
ucs2_estonian_ciucs2134
Yes8
ucs2_spanish_ciucs2135
Yes8
ucs2_swedish_ciucs2136
Yes8
ucs2_turkish_ciucs2137
Yes8
ucs2_czech_ciucs2138
Yes8
ucs2_danish_ciucs2139
Yes8
ucs2_lithuanian_ciucs2140
Yes8
ucs2_slovak_ciucs2141
Yes8
ucs2_spanish2_ciucs2142
Yes8
ucs2_roman_ciucs2143
Yes8
ucs2_persian_ciucs2144
Yes8
ucs2_esperanto_ciucs2145
Yes8
ucs2_hungarian_ciucs2146
Yes8
ucs2_sinhala_ciucs2147
Yes8
ucs2_german2_ciucs2148
Yes8
ucs2_croatian_ciucs2149
Yes8
ucs2_unicode_520_ciucs2150
Yes8
ucs2_vietnamese_ciucs2151
Yes8
ucs2_general_mysql500_ciucs2159
Yes1
cp866_general_cicp86636YesYes1
cp866_bincp86668
Yes1
keybcs2_general_cikeybcs237YesYes1
keybcs2_binkeybcs273
Yes1
macce_general_cimacce38YesYes1
macce_binmacce43
Yes1
macroman_general_cimacroman39YesYes1
macroman_binmacroman53
Yes1
cp852_general_cicp85240YesYes1
cp852_bincp85281
Yes1
latin7_estonian_cslatin720
Yes1
latin7_general_cilatin741YesYes1
latin7_general_cslatin742
Yes1
latin7_binlatin779
Yes1
utf8mb4_general_ciutf8mb445YesYes1
utf8mb4_binutf8mb446
Yes1
utf8mb4_unicode_ciutf8mb4224
Yes8
utf8mb4_icelandic_ciutf8mb4225
Yes8
utf8mb4_latvian_ciutf8mb4226
Yes8
utf8mb4_romanian_ciutf8mb4227
Yes8
utf8mb4_slovenian_ciutf8mb4228
Yes8
utf8mb4_polish_ciutf8mb4229
Yes8
utf8mb4_estonian_ciutf8mb4230
Yes8
utf8mb4_spanish_ciutf8mb4231
Yes8
utf8mb4_swedish_ciutf8mb4232
Yes8
utf8mb4_turkish_ciutf8mb4233
Yes8
utf8mb4_czech_ciutf8mb4234
Yes8
utf8mb4_danish_ciutf8mb4235
Yes8
utf8mb4_lithuanian_ciutf8mb4236
Yes8
utf8mb4_slovak_ciutf8mb4237
Yes8
utf8mb4_spanish2_ciutf8mb4238
Yes8
utf8mb4_roman_ciutf8mb4239
Yes8
utf8mb4_persian_ciutf8mb4240
Yes8
utf8mb4_esperanto_ciutf8mb4241
Yes8
utf8mb4_hungarian_ciutf8mb4242
Yes8
utf8mb4_sinhala_ciutf8mb4243
Yes8
utf8mb4_german2_ciutf8mb4244
Yes8
utf8mb4_croatian_ciutf8mb4245
Yes8
utf8mb4_unicode_520_ciutf8mb4246
Yes8
utf8mb4_vietnamese_ciutf8mb4247
Yes8
cp1251_bulgarian_cicp125114
Yes1
cp1251_ukrainian_cicp125123
Yes1
cp1251_bincp125150
Yes1
cp1251_general_cicp125151YesYes1
cp1251_general_cscp125152
Yes1
utf16_general_ciutf1654YesYes1
utf16_binutf1655
Yes1
utf16_unicode_ciutf16101
Yes8
utf16_icelandic_ciutf16102
Yes8
utf16_latvian_ciutf16103
Yes8
utf16_romanian_ciutf16104
Yes8
utf16_slovenian_ciutf16105
Yes8
utf16_polish_ciutf16106
Yes8
utf16_estonian_ciutf16107
Yes8
utf16_spanish_ciutf16108
Yes8
utf16_swedish_ciutf16109
Yes8
utf16_turkish_ciutf16110
Yes8
utf16_czech_ciutf16111
Yes8
utf16_danish_ciutf16112
Yes8
utf16_lithuanian_ciutf16113
Yes8
utf16_slovak_ciutf16114
Yes8
utf16_spanish2_ciutf16115
Yes8
utf16_roman_ciutf16116
Yes8
utf16_persian_ciutf16117
Yes8
utf16_esperanto_ciutf16118
Yes8
utf16_hungarian_ciutf16119
Yes8
utf16_sinhala_ciutf16120
Yes8
utf16_german2_ciutf16121
Yes8
utf16_croatian_ciutf16122
Yes8
utf16_unicode_520_ciutf16123
Yes8
utf16_vietnamese_ciutf16124
Yes8
utf16le_general_ciutf16le56YesYes1
utf16le_binutf16le62
Yes1
cp1256_general_cicp125657YesYes1
cp1256_bincp125667
Yes1
cp1257_lithuanian_cicp125729
Yes1
cp1257_bincp125758
Yes1
cp1257_general_cicp125759YesYes1
utf32_general_ciutf3260YesYes1
utf32_binutf3261
Yes1
utf32_unicode_ciutf32160
Yes8
utf32_icelandic_ciutf32161
Yes8
utf32_latvian_ciutf32162
Yes8
utf32_romanian_ciutf32163
Yes8
utf32_slovenian_ciutf32164
Yes8
utf32_polish_ciutf32165
Yes8
utf32_estonian_ciutf32166
Yes8
utf32_spanish_ciutf32167
Yes8
utf32_swedish_ciutf32168
Yes8
utf32_turkish_ciutf32169
Yes8
utf32_czech_ciutf32170
Yes8
utf32_danish_ciutf32171
Yes8
utf32_lithuanian_ciutf32172
Yes8
utf32_slovak_ciutf32173
Yes8
utf32_spanish2_ciutf32174
Yes8
utf32_roman_ciutf32175
Yes8
utf32_persian_ciutf32176
Yes8
utf32_esperanto_ciutf32177
Yes8
utf32_hungarian_ciutf32178
Yes8
utf32_sinhala_ciutf32179
Yes8
utf32_german2_ciutf32180
Yes8
utf32_croatian_ciutf32181
Yes8
utf32_unicode_520_ciutf32182
Yes8
utf32_vietnamese_ciutf32183
Yes8
binarybinary63YesYes1
geostd8_general_cigeostd892YesYes1
geostd8_bingeostd893
Yes1
cp932_japanese_cicp93295YesYes1
cp932_bincp93296
Yes1
eucjpms_japanese_cieucjpms97YesYes1
eucjpms_bineucjpms98
Yes1
gb18030_chinese_cigb18030248YesYes2
gb18030_bingb18030249
Yes1
gb18030_unicode_520_cigb18030250
Yes8





    排序规则是一组定义如何比较和排序字符串的规则。 MySQL中的每个排序规则都属于一个字符集。每个字符集至少具有一个排序规则,大多数字符集具有两个或多个排序规则。排序规则根据权重对字符排序。字符集中的每个字符都映射到权重。权重相等的字符比较相等,权重不相等的字符根据其权重的相对大小进行比较。 WEIGHT_STRING()函数可用于查看字符串中字符的权重。它返回以表示重量的值是一个二进制字符串,因此使用HEX(WEIGHT_STRING(str))以可打印的形式显示重量很方便。下面的示例显示,如果“ AaBb”中的字母为非二进制不区分大小写的字符串,则其字母大小写的权重不会有所不同,但如果它是二进制字符串,则权重不会有所不同:



mysql> SELECT HEX(WEIGHT_STRING('AaBb' COLLATE latin1_swedish_ci));
+------------------------------------------------------+
| HEX(WEIGHT_STRING('AaBb' COLLATE latin1_swedish_ci)) |
+------------------------------------------------------+
| 41414242                                             |
+------------------------------------------------------+
mysql> SELECT HEX(WEIGHT_STRING(BINARY 'AaBb'));
+-----------------------------------+
| HEX(WEIGHT_STRING(BINARY 'AaBb')) |
+-----------------------------------+
| 41614262                        |
+-----------------------------------+




注意:如果您修改现有的排序规则,则可能会影响使用该排序规则的列上的索引的行顺序。在这种情况下,请重建任何此类索引,以避免出现诸如查询结果不正确的问题。






页: [1]
查看完整版本: 深入理解MYSQL 字符集(Charset)和排序规则(Collation)