请选择 进入手机版 | 继续访问电脑版
设为首页收藏本站 JeeStudy企业开发官网①

JEE Study|JAVA EE|企业级开发学习网

 找回密码
 立即加入

QQ登录

只需一步,快速开始

扫一扫,访问微社区

搜索
查看: 299|回复: 0

深入理解MYSQL 字符集(Charset)和排序规则(Collation)

[复制链接]

219

主题

221

帖子

1418

积分

管理员

Rank: 9Rank: 9Rank: 9

积分
1418

最佳新人活跃会员热心会员推广达人宣传达人灌水之王突出贡献优秀版主荣誉管理论坛元老

发表于 2020-6-16 21:45:40 | 显示全部楼层 |阅读模式
    字符集(character set)是一组符号和编码。排序规则(collation是一组用于比较字符集中的字符的规则。让我们通过一个假想字符集的例子来清楚地区分。


    假设我们有一四个字母:A,B,a,b。我们给每个字母一个数字:A = 0,B = 1,a = 2,b =3。字母A是一个符号,数字0是A的编码,所有四个字母及其编码的组合为字符集。
    假设我们要比较两个字符串值A和B。最简单的方法是查看编码:A表示0,B表示1。因为0小于1,所以我们说A小于B。
    我们刚才所做的是将排序规则应用于字符集。排序规则是一组规则(在这种情况下,只有一个规则):“比较编码”。
    我们将所有可能的排序规则中最简单的一种称为二进制排序规则。
    但是,如果我们想说小写字母和大写字母是等效的呢?
    那么我们将至少有两个规则:
    (1)将小写字母a和b视为与A和B等效;
    (2)然后比较编码。
    我们称此为不区分大小写的排序规则。它比二进制排序规则要复杂一些。
    在现实生活中,大多数字符集都有许多字符:不仅是A和B,而且还有所有字母,有时甚至是多个字母或带有数千个字符的东部书写系统,以及许多特殊符号和标点符号。同样在现实生活中,大多数排序规则都有很多规则,不仅用于区分字母大小写,而且还用于区分是否区分重音(“重音”是附加在字符上的标记,如德语Ö),以及用于多字符映射(例如在两个德国归类之一中Ö= OE的规则)。

MySQL可以为您做这些事情:
    使用各种字符集存储字符串。
    使用各种排序规则比较字符串。
    在同一服务器,同一数据库甚至同一表中混合使用具有不同字符集或排序规则的字符串。

    在任何级别启用字符集和排序规则的规范。


若要有效使用这些功能,您必须知道可用的字符集和排序规则,如何更改默认值以及它们如何影响字符串运算符和函数的行为。




MySQL为您提供了SHOW CHARACTER SET语句或者show charset;,该语句可让您获取字符集的默认排序规则,如下所示:
Charset
Description
Default  collation
Maxlen
big5
Big5 Traditional  Chinese
big5_chinese_ci
2
8-Dec
DEC West European
dec8_swedish_ci
1
cp850
DOS West European
cp850_general_ci
1
hp8
HP West European
hp8_english_ci
1
koi8r
KOI8-R Relcom Russian
koi8r_general_ci
1
latin1
cp1252 West European
latin1_swedish_ci
1
latin2
ISO 8859-2 Central  European
latin2_general_ci
1
swe7
7bit Swedish
swe7_swedish_ci
1
ascii
US ASCII
ascii_general_ci
1
ujis
EUC-JP Japanese
ujis_japanese_ci
3
sjis
Shift-JIS Japanese
sjis_japanese_ci
2
hebrew
ISO 8859-8 Hebrew
hebrew_general_ci
1
tis620
TIS620 Thai
tis620_thai_ci
1
euckr
EUC-KR Korean
euckr_korean_ci
2
koi8u
KOI8-U Ukrainian
koi8u_general_ci
1
gb2312
GB2312 Simplified  Chinese
gb2312_chinese_ci
2
greek
ISO 8859-7 Greek
greek_general_ci
1
cp1250
Windows Central  European
cp1250_general_ci
1
gbk
GBK Simplified  Chinese
gbk_chinese_ci
2
latin5
ISO 8859-9 Turkish
latin5_turkish_ci
1
armscii8
ARMSCII-8 Armenian
armscii8_general_ci
1
utf8
UTF-8 Unicode
utf8_general_ci
3
ucs2
UCS-2 Unicode
ucs2_general_ci
2
cp866
DOS Russian
cp866_general_ci
1
keybcs2
DOS Kamenicky  Czech-Slovak
keybcs2_general_ci
1
macce
Mac Central European
macce_general_ci
1
macroman
Mac West European
macroman_general_ci
1
cp852
DOS Central European
cp852_general_ci
1
latin7
ISO 8859-13 Baltic
latin7_general_ci
1
utf8mb4
UTF-8 Unicode
utf8mb4_general_ci
4
cp1251
Windows Cyrillic
cp1251_general_ci
1
utf16
UTF-16 Unicode
utf16_general_ci
4
utf16le
UTF-16LE Unicode
utf16le_general_ci
4
cp1256
Windows Arabic
cp1256_general_ci
1
cp1257
Windows Baltic
cp1257_general_ci
1
utf32
UTF-32 Unicode
utf32_general_ci
4
binary
Binary pseudo charset
binary
1
geostd8
GEOSTD8 Georgian
geostd8_general_ci
1
cp932
SJIS for Windows  Japanese
cp932_japanese_ci
2
eucjpms
UJIS for Windows  Japanese
eucjpms_japanese_ci
3
gb18030
China National  Standard GB18030
gb18030_chinese_ci
4
按照约定,字符集的排序规则以字符集名称开头,并以_ci(不区分大小写)_cs(区分大小写)或_bin(二进制)结束。

要获取给定字符集的所有排序规则,请使用SHOW COLLATION语句,如下所示:
Collation
Charset
Id
Default
Compiled
Sortlen
big5_chinese_ci
big5
1
Yes
Yes
1
big5_bin
big5
84

Yes
1
dec8_swedish_ci
8-Dec
3
Yes
Yes
1
dec8_bin
8-Dec
69

Yes
1
cp850_general_ci
cp850
4
Yes
Yes
1
cp850_bin
cp850
80

Yes
1
hp8_english_ci
hp8
6
Yes
Yes
1
hp8_bin
hp8
72

Yes
1
koi8r_general_ci
koi8r
7
Yes
Yes
1
koi8r_bin
koi8r
74

Yes
1
latin1_german1_ci
latin1
5

Yes
1
latin1_swedish_ci
latin1
8
Yes
Yes
1
latin1_danish_ci
latin1
15

Yes
1
latin1_german2_ci
latin1
31

Yes
2
latin1_bin
latin1
47

Yes
1
latin1_general_ci
latin1
48

Yes
1
latin1_general_cs
latin1
49

Yes
1
latin1_spanish_ci
latin1
94

Yes
1
latin2_czech_cs
latin2
2

Yes
4
latin2_general_ci
latin2
9
Yes
Yes
1
latin2_hungarian_ci
latin2
21

Yes
1
latin2_croatian_ci
latin2
27

Yes
1
latin2_bin
latin2
77

Yes
1
swe7_swedish_ci
swe7
10
Yes
Yes
1
swe7_bin
swe7
82

Yes
1
ascii_general_ci
ascii
11
Yes
Yes
1
ascii_bin
ascii
65

Yes
1
ujis_japanese_ci
ujis
12
Yes
Yes
1
ujis_bin
ujis
91

Yes
1
sjis_japanese_ci
sjis
13
Yes
Yes
1
sjis_bin
sjis
88

Yes
1
hebrew_general_ci
hebrew
16
Yes
Yes
1
hebrew_bin
hebrew
71

Yes
1
tis620_thai_ci
tis620
18
Yes
Yes
4
tis620_bin
tis620
89

Yes
1
euckr_korean_ci
euckr
19
Yes
Yes
1
euckr_bin
euckr
85

Yes
1
koi8u_general_ci
koi8u
22
Yes
Yes
1
koi8u_bin
koi8u
75

Yes
1
gb2312_chinese_ci
gb2312
24
Yes
Yes
1
gb2312_bin
gb2312
86

Yes
1
greek_general_ci
greek
25
Yes
Yes
1
greek_bin
greek
70

Yes
1
cp1250_general_ci
cp1250
26
Yes
Yes
1
cp1250_czech_cs
cp1250
34

Yes
2
cp1250_croatian_ci
cp1250
44

Yes
1
cp1250_bin
cp1250
66

Yes
1
cp1250_polish_ci
cp1250
99

Yes
1
gbk_chinese_ci
gbk
28
Yes
Yes
1
gbk_bin
gbk
87

Yes
1
latin5_turkish_ci
latin5
30
Yes
Yes
1
latin5_bin
latin5
78

Yes
1
armscii8_general_ci
armscii8
32
Yes
Yes
1
armscii8_bin
armscii8
64

Yes
1
utf8_general_ci
utf8
33
Yes
Yes
1
utf8_bin
utf8
83

Yes
1
utf8_unicode_ci
utf8
192

Yes
8
utf8_icelandic_ci
utf8
193

Yes
8
utf8_latvian_ci
utf8
194

Yes
8
utf8_romanian_ci
utf8
195

Yes
8
utf8_slovenian_ci
utf8
196

Yes
8
utf8_polish_ci
utf8
197

Yes
8
utf8_estonian_ci
utf8
198

Yes
8
utf8_spanish_ci
utf8
199

Yes
8
utf8_swedish_ci
utf8
200

Yes
8
utf8_turkish_ci
utf8
201

Yes
8
utf8_czech_ci
utf8
202

Yes
8
utf8_danish_ci
utf8
203

Yes
8
utf8_lithuanian_ci
utf8
204

Yes
8
utf8_slovak_ci
utf8
205

Yes
8
utf8_spanish2_ci
utf8
206

Yes
8
utf8_roman_ci
utf8
207

Yes
8
utf8_persian_ci
utf8
208

Yes
8
utf8_esperanto_ci
utf8
209

Yes
8
utf8_hungarian_ci
utf8
210

Yes
8
utf8_sinhala_ci
utf8
211

Yes
8
utf8_german2_ci
utf8
212

Yes
8
utf8_croatian_ci
utf8
213

Yes
8
utf8_unicode_520_ci
utf8
214

Yes
8
utf8_vietnamese_ci
utf8
215

Yes
8
utf8_general_mysql500_ci
utf8
223

Yes
1
ucs2_general_ci
ucs2
35
Yes
Yes
1
ucs2_bin
ucs2
90

Yes
1
ucs2_unicode_ci
ucs2
128

Yes
8
ucs2_icelandic_ci
ucs2
129

Yes
8
ucs2_latvian_ci
ucs2
130

Yes
8
ucs2_romanian_ci
ucs2
131

Yes
8
ucs2_slovenian_ci
ucs2
132

Yes
8
ucs2_polish_ci
ucs2
133

Yes
8
ucs2_estonian_ci
ucs2
134

Yes
8
ucs2_spanish_ci
ucs2
135

Yes
8
ucs2_swedish_ci
ucs2
136

Yes
8
ucs2_turkish_ci
ucs2
137

Yes
8
ucs2_czech_ci
ucs2
138

Yes
8
ucs2_danish_ci
ucs2
139

Yes
8
ucs2_lithuanian_ci
ucs2
140

Yes
8
ucs2_slovak_ci
ucs2
141

Yes
8
ucs2_spanish2_ci
ucs2
142

Yes
8
ucs2_roman_ci
ucs2
143

Yes
8
ucs2_persian_ci
ucs2
144

Yes
8
ucs2_esperanto_ci
ucs2
145

Yes
8
ucs2_hungarian_ci
ucs2
146

Yes
8
ucs2_sinhala_ci
ucs2
147

Yes
8
ucs2_german2_ci
ucs2
148

Yes
8
ucs2_croatian_ci
ucs2
149

Yes
8
ucs2_unicode_520_ci
ucs2
150

Yes
8
ucs2_vietnamese_ci
ucs2
151

Yes
8
ucs2_general_mysql500_ci
ucs2
159

Yes
1
cp866_general_ci
cp866
36
Yes
Yes
1
cp866_bin
cp866
68

Yes
1
keybcs2_general_ci
keybcs2
37
Yes
Yes
1
keybcs2_bin
keybcs2
73

Yes
1
macce_general_ci
macce
38
Yes
Yes
1
macce_bin
macce
43

Yes
1
macroman_general_ci
macroman
39
Yes
Yes
1
macroman_bin
macroman
53

Yes
1
cp852_general_ci
cp852
40
Yes
Yes
1
cp852_bin
cp852
81

Yes
1
latin7_estonian_cs
latin7
20

Yes
1
latin7_general_ci
latin7
41
Yes
Yes
1
latin7_general_cs
latin7
42

Yes
1
latin7_bin
latin7
79

Yes
1
utf8mb4_general_ci
utf8mb4
45
Yes
Yes
1
utf8mb4_bin
utf8mb4
46

Yes
1
utf8mb4_unicode_ci
utf8mb4
224

Yes
8
utf8mb4_icelandic_ci
utf8mb4
225

Yes
8
utf8mb4_latvian_ci
utf8mb4
226

Yes
8
utf8mb4_romanian_ci
utf8mb4
227

Yes
8
utf8mb4_slovenian_ci
utf8mb4
228

Yes
8
utf8mb4_polish_ci
utf8mb4
229

Yes
8
utf8mb4_estonian_ci
utf8mb4
230

Yes
8
utf8mb4_spanish_ci
utf8mb4
231

Yes
8
utf8mb4_swedish_ci
utf8mb4
232

Yes
8
utf8mb4_turkish_ci
utf8mb4
233

Yes
8
utf8mb4_czech_ci
utf8mb4
234

Yes
8
utf8mb4_danish_ci
utf8mb4
235

Yes
8
utf8mb4_lithuanian_ci
utf8mb4
236

Yes
8
utf8mb4_slovak_ci
utf8mb4
237

Yes
8
utf8mb4_spanish2_ci
utf8mb4
238

Yes
8
utf8mb4_roman_ci
utf8mb4
239

Yes
8
utf8mb4_persian_ci
utf8mb4
240

Yes
8
utf8mb4_esperanto_ci
utf8mb4
241

Yes
8
utf8mb4_hungarian_ci
utf8mb4
242

Yes
8
utf8mb4_sinhala_ci
utf8mb4
243

Yes
8
utf8mb4_german2_ci
utf8mb4
244

Yes
8
utf8mb4_croatian_ci
utf8mb4
245

Yes
8
utf8mb4_unicode_520_ci
utf8mb4
246

Yes
8
utf8mb4_vietnamese_ci
utf8mb4
247

Yes
8
cp1251_bulgarian_ci
cp1251
14

Yes
1
cp1251_ukrainian_ci
cp1251
23

Yes
1
cp1251_bin
cp1251
50

Yes
1
cp1251_general_ci
cp1251
51
Yes
Yes
1
cp1251_general_cs
cp1251
52

Yes
1
utf16_general_ci
utf16
54
Yes
Yes
1
utf16_bin
utf16
55

Yes
1
utf16_unicode_ci
utf16
101

Yes
8
utf16_icelandic_ci
utf16
102

Yes
8
utf16_latvian_ci
utf16
103

Yes
8
utf16_romanian_ci
utf16
104

Yes
8
utf16_slovenian_ci
utf16
105

Yes
8
utf16_polish_ci
utf16
106

Yes
8
utf16_estonian_ci
utf16
107

Yes
8
utf16_spanish_ci
utf16
108

Yes
8
utf16_swedish_ci
utf16
109

Yes
8
utf16_turkish_ci
utf16
110

Yes
8
utf16_czech_ci
utf16
111

Yes
8
utf16_danish_ci
utf16
112

Yes
8
utf16_lithuanian_ci
utf16
113

Yes
8
utf16_slovak_ci
utf16
114

Yes
8
utf16_spanish2_ci
utf16
115

Yes
8
utf16_roman_ci
utf16
116

Yes
8
utf16_persian_ci
utf16
117

Yes
8
utf16_esperanto_ci
utf16
118

Yes
8
utf16_hungarian_ci
utf16
119

Yes
8
utf16_sinhala_ci
utf16
120

Yes
8
utf16_german2_ci
utf16
121

Yes
8
utf16_croatian_ci
utf16
122

Yes
8
utf16_unicode_520_ci
utf16
123

Yes
8
utf16_vietnamese_ci
utf16
124

Yes
8
utf16le_general_ci
utf16le
56
Yes
Yes
1
utf16le_bin
utf16le
62

Yes
1
cp1256_general_ci
cp1256
57
Yes
Yes
1
cp1256_bin
cp1256
67

Yes
1
cp1257_lithuanian_ci
cp1257
29

Yes
1
cp1257_bin
cp1257
58

Yes
1
cp1257_general_ci
cp1257
59
Yes
Yes
1
utf32_general_ci
utf32
60
Yes
Yes
1
utf32_bin
utf32
61

Yes
1
utf32_unicode_ci
utf32
160

Yes
8
utf32_icelandic_ci
utf32
161

Yes
8
utf32_latvian_ci
utf32
162

Yes
8
utf32_romanian_ci
utf32
163

Yes
8
utf32_slovenian_ci
utf32
164

Yes
8
utf32_polish_ci
utf32
165

Yes
8
utf32_estonian_ci
utf32
166

Yes
8
utf32_spanish_ci
utf32
167

Yes
8
utf32_swedish_ci
utf32
168

Yes
8
utf32_turkish_ci
utf32
169

Yes
8
utf32_czech_ci
utf32
170

Yes
8
utf32_danish_ci
utf32
171

Yes
8
utf32_lithuanian_ci
utf32
172

Yes
8
utf32_slovak_ci
utf32
173

Yes
8
utf32_spanish2_ci
utf32
174

Yes
8
utf32_roman_ci
utf32
175

Yes
8
utf32_persian_ci
utf32
176

Yes
8
utf32_esperanto_ci
utf32
177

Yes
8
utf32_hungarian_ci
utf32
178

Yes
8
utf32_sinhala_ci
utf32
179

Yes
8
utf32_german2_ci
utf32
180

Yes
8
utf32_croatian_ci
utf32
181

Yes
8
utf32_unicode_520_ci
utf32
182

Yes
8
utf32_vietnamese_ci
utf32
183

Yes
8
binary
binary
63
Yes
Yes
1
geostd8_general_ci
geostd8
92
Yes
Yes
1
geostd8_bin
geostd8
93

Yes
1
cp932_japanese_ci
cp932
95
Yes
Yes
1
cp932_bin
cp932
96

Yes
1
eucjpms_japanese_ci
eucjpms
97
Yes
Yes
1
eucjpms_bin
eucjpms
98

Yes
1
gb18030_chinese_ci
gb18030
248
Yes
Yes
2
gb18030_bin
gb18030
249

Yes
1
gb18030_unicode_520_ci
gb18030
250

Yes
8





    排序规则是一组定义如何比较和排序字符串的规则。 MySQL中的每个排序规则都属于一个字符集。每个字符集至少具有一个排序规则,大多数字符集具有两个或多个排序规则。排序规则根据权重对字符排序。字符集中的每个字符都映射到权重。权重相等的字符比较相等,权重不相等的字符根据其权重的相对大小进行比较。 WEIGHT_STRING()函数可用于查看字符串中字符的权重。它返回以表示重量的值是一个二进制字符串,因此使用HEX(WEIGHT_STRING(str))以可打印的形式显示重量很方便。下面的示例显示,如果“ AaBb”中的字母为非二进制不区分大小写的字符串,则其字母大小写的权重不会有所不同,但如果它是二进制字符串,则权重不会有所不同:



[Shell] 纯文本查看 复制代码
mysql> SELECT HEX(WEIGHT_STRING('AaBb' COLLATE latin1_swedish_ci));
+------------------------------------------------------+
| HEX(WEIGHT_STRING('AaBb' COLLATE latin1_swedish_ci)) |
+------------------------------------------------------+
| 41414242                                             |
+------------------------------------------------------+
mysql> SELECT HEX(WEIGHT_STRING(BINARY 'AaBb'));
+-----------------------------------+
| HEX(WEIGHT_STRING(BINARY 'AaBb')) |
+-----------------------------------+
| 41614262                          |
+-----------------------------------+





注意:如果您修改现有的排序规则,则可能会影响使用该排序规则的列上的索引的行顺序。在这种情况下,请重建任何此类索引,以避免出现诸如查询结果不正确的问题。







知识改变命运!

JAVA EE 学习     JAVA EE 资料
JEE Study:企业级开发学习网!
回复

使用道具 举报

高级模式
B Color Image Link Quote Code Smilies

本版积分规则

QQ|Archiver|手机版|小黑屋|JEE Study ( 京ICP备16036936   JeeStudy企业开发官网①

GMT+8, 2021-1-27 18:08 , Processed in 0.205046 second(s), 28 queries .

Powered by JeeStudy!

© 2008-2020 JEE Study 企业级开发学习网

快速回复 返回顶部 返回列表