西里尔字符编码简介

西里尔字母或斯拉夫字母、基里尔字母、基利尔字母、基立爾字母(俄文:Кириллическийалфавит或кириллица,转写:kirillica),是通行于斯拉夫语族大多数民族中的字母书写系统。

历史

西里尔字母源于脱胎自希腊字母的格拉哥里字母,普遍认为是由基督教传教士圣西里尔(827年–869年)和聖梅篤丟斯在9世纪为了在斯拉夫民族传播基督教(当时东西教会还未正式分裂)方便所创立的,被斯拉夫民族广泛采用,因此有时也称为斯拉夫字母。早期的西里尔字母又称作古斯拉夫語字母(現代的斯拉夫字母經過修改)

使用状况

目前使用西里尔字母的文字不少是斯拉夫语族的语言,包括俄语、乌克兰语、卢森尼亚语、白俄罗斯语、保加利亚语、塞尔维亚语、马其顿语等。而属于斯拉夫语族西斯拉夫语支的索布语、波兰语、捷克语和斯洛伐克语等,则向来以拉丁字母书写。属于西南斯拉夫语支的塞尔维亚语、克罗地亚语和波斯尼亚语,原本被看成是同一种语言(参看塞尔维亚-克罗地亚语),但因宗教和族裔等原因分立成三种语言。克罗地亚语和波斯尼亚语以拉丁字母书写,塞尔维亚语则同时以西里尔字母和拉丁字母书写。在1930年前后,苏联为苏联境内的许多少数民族进行文字改革,用西里尔字母替代原有的少数民族文字字母,所以现在原苏联境内的许多民族文字使用西里尔字母。其中使用人口较多的语言有哈萨克语、塔吉克语、柯尔克孜语、维吾尔语、巴什基尔语、楚瓦什语、车臣语、卡巴尔达语、马里语、阿瓦尔语、乌德穆尔特语等。蒙古国的文字也改用了西里尔字母书写。阿塞拜疆语、格鲁吉亚语、土库曼语、乌兹别克语等文字曾经用西里尔字母改造,在脱离苏联独立后又恢复自己原有的文字。在俄罗斯联邦内,车臣共和国使用的车臣语和鞑靼斯坦共和国使用的鞑靼语,原打算改以拉丁字母书写,但受到苏联和俄罗斯政府阻挠。苏联亦曾替摩尔达维亚的文字(摩尔达维亚语)改以西里尔字母书写,但摩尔多瓦独立後,多数摩尔多瓦人认为摩尔多瓦语并不存在,而他们使用的是罗马尼亚语。

斯拉夫语族

古教会斯拉夫语

早期西里尔字母

?, А, Б, В, Г, ?, Е, ?, Ж

З, ?, ?, Й, К, Л, М

Н, О,?, П, Р, С,?

У, Ф,?, ?, Х, Ц, Ч, Ш

Щ,Щ?, Ы,?, Э,И?

?, ?, ?, ?(Эр,Yus)

?(ЙрЯз,Ksi), ?(ОрЯз,Psi), ?(РзсЯ,Sita)

?, ?, ?

俄语

乌克兰语

白俄罗斯语

保加利亚语

塞尔维亚语

马其顿语

阿布哈兹语

楚瓦什语

哈萨克语

维吾尔语

蒙古语

布里亚特语

卡尔梅克语

电脑编码

KOI-8 编码

KOI-8编码系列是在Unicode 流行之前很流行的西里尔字母电脑编码,主要有 KOI8-R (俄语)、 KOI8-U (乌克兰语) 等版本。

ISO 8859-5 编码

ISO 8859-5编码是国际标准化组织制订的西里尔字母电脑编码,但使用率却不如KOI-8。

Unicode 编码

西里爾字母

https://www.360docs.net/doc/4d15721497.html, chart (PDF)

0123456789A B C D E F U+040x?Ё??????????????U+041x?АБВГДЕЖЗИЙКЛМНОU+042xПРСТУФХЦЧШЩЪЫЬЭЮU+043xЯабвгдежзийклмноU+044xпрстуфхцчшщъыьэюU+045x?я?ё????????????U+046x????????????????U+047x????????????????U+048x????????????????

U+049x????????????????U+04Ax????????????????U+04Bx????????????????U+04Cx????????????????U+04Dx????????????????U+04Ex????????????????U+04Fx????????????????

KOI-8到Unicode映射表,0x00~0x7F兼容Ascii码,表中列了0x80~0xFF的映射。下表同。/* From old Koi-8 to Unicode */

long oldkoi8tou[128] = {

-2, -2, -2, -2, -2, -2, -2, -2,

-2, -2, -2, -2, -2, -2, -2, -2,

-2, -2, -2, -2, -2, -2, -2, -2,

-2, -2, -2, -2, -2, -2, -2, -2,

-2, -2, -2, -2, -2, -2, -2, -2,

-2, -2, -2, -2, -2, -2, -2, -2,

-2, -2, -2, -2, -2, -2, -2, -2,

-2, -2, -2, -2, -2, -2, -2, -2,

0x044e,0x0430,0x0431,0x0446,0x0434,0x0435,0x0444,0x0433,

0x0445,0x0438,0x0439,0x043a,0x043b,0x043c,0x043d,0x043e,

0x043f,0x044f,0x0440,0x0441,0x0442,0x0443,0x0436,0x0432,

0x044c,0x044b,0x0437,0x0448,0x044d,0x0449,0x0447,0x044a,

0x042e,0x0410,0x0411,0x0426,0x0414,0x0415,0x0424,0x0413,

0x0425,0x0418,0x0419,0x041a,0x041b,0x041c,0x041d,0x041e, 0x041f,0x042f,0x0420,0x0421,0x0422,0x0423,0x0416,0x0412, 0x042c,0x042b,0x0417,0x0428,0x042d,0x0429,0x0427,0x042a };

Code Page866 字符集到Unicode映射表

/* From CP866 to Unicode */

long cp866tou[128] = {

0x0410,0x0411,0x0412,0x0413,0x0414,0x0415,0x0416,0x0417, 0x0418,0x0419,0x041a,0x041b,0x041c,0x041d,0x041e,0x041f, 0x0420,0x0421,0x0422,0x0423,0x0424,0x0425,0x0426,0x0427, 0x0428,0x0429,0x042a,0x042b,0x042c,0x042d,0x042e,0x042f, 0x0430,0x0431,0x0432,0x0433,0x0434,0x0435,0x0436,0x0437, 0x0438,0x0439,0x043a,0x043b,0x043c,0x043d,0x043e,0x043f, 0x2591,0x2592,0x2593,0x2502,0x2524,0x2561,0x2562,0x2556, 0x2555,0x2563,0x2551,0x2557,0x255d,0x255c,0x255b,0x2510, 0x2514,0x2534,0x252c,0x251c,0x2500,0x253c,0x255e,0x255f, 0x255a,0x2554,0x2569,0x2566,0x2560,0x2550,0x256c,0x2567, 0x2568,0x2564,0x2565,0x2559,0x2558,0x2552,0x2553,0x256b, 0x256a,0x2518,0x250c,0x2588,0x2584,0x258c,0x2590,0x2580, 0x0440,0x0441,0x0442,0x0443,0x0444,0x0445,0x0446,0x0447, 0x0448,0x0449,0x044a,0x044b,0x044c,0x044d,0x044e,0x044f, 0x0401,0x0451,0x0404,0x0454,0x0407,0x0457,0x040e,0x045e, 0x00b0,0x2022,0x00b7,0x221a,0x2116,0x00a4,0x25a0, -1 };

Code Page 1251字符集(windows用)到Unicode映射表

/* From CP1251 to Unicode */

long cp1251tou[128] = {

0x0402,0x0403,0x201a,0x0453,0x201e,0x2026,0x2020,0x2021, -1,0x2030,0x0409,0x2039,0x040a,0x040c,0x040b,0x040f, 0x0452,0x2018,0x2019,0x201c,0x201d,0x2022,0x2013,0x2014, -1,0x2122,0x0459,0x203a,0x045a,0x045c,0x045b,0x045f, 0x00a0,0x040e,0x045e,0x0408,0x00a4,0x0490,0x00a6,0x00a7, 0x0401,0x00a9,0x0404,0x00ab,0x00ac,0x00ad,0x00ae,0x0407, 0x00b0,0x00b1,0x0406,0x0456,0x0491,0x00b5,0x00b6,0x00b7, 0x0451,0x2116,0x0454,0x00bb,0x0458,0x0405,0x0455,0x0457, 0x0410,0x0411,0x0412,0x0413,0x0414,0x0415,0x0416,0x0417, 0x0418,0x0419,0x041a,0x041b,0x041c,0x041d,0x041e,0x041f, 0x0420,0x0421,0x0422,0x0423,0x0424,0x0425,0x0426,0x0427,

0x0428,0x0429,0x042a,0x042b,0x042c,0x042d,0x042e,0x042f, 0x0430,0x0431,0x0432,0x0433,0x0434,0x0435,0x0436,0x0437, 0x0438,0x0439,0x043a,0x043b,0x043c,0x043d,0x043e,0x043f, 0x0440,0x0441,0x0442,0x0443,0x0444,0x0445,0x0446,0x0447, 0x0448,0x0449,0x044a,0x044b,0x044c,0x044d,0x044e,0x044f, };

ISO8859-5字符集到Unicode映射表

/* From ISO8859-5 to Unicode */

long newkoi8tou[128] = {

-1, -1, -1, -1, -1, -1, -1, -1,

-1, -1, -1, -1, -1, -1, -1, -1,

-1, -1, -1, -1, -1, -1, -1, -1,

-1, -1, -1, -1, -1, -1, -1, -1,

0x00a0,0x0401,0x0402,0x0403,0x0404,0x0405,0x0406,0x0407, 0x0408,0x0409,0x040a,0x040b,0x040c,0x00ad,0x040e,0x040f, 0x0410,0x0411,0x0412,0x0413,0x0414,0x0415,0x0416,0x0417, 0x0418,0x0419,0x041a,0x041b,0x041c,0x041d,0x041e,0x041f, 0x0420,0x0421,0x0422,0x0423,0x0424,0x0425,0x0426,0x0427, 0x0428,0x0429,0x042a,0x042b,0x042c,0x042d,0x042e,0x042f, 0x0430,0x0431,0x0432,0x0433,0x0434,0x0435,0x0436,0x0437, 0x0438,0x0439,0x043a,0x043b,0x043c,0x043d,0x043e,0x043f, 0x0440,0x0441,0x0442,0x0443,0x0444,0x0445,0x0446,0x0447, 0x0448,0x0449,0x044a,0x044b,0x044c,0x044d,0x044e,0x044f, 0x2116,0x0451,0x0452,0x0453,0x0454,0x0455,0x0456,0x0457, 0x0458,0x0459,0x045a,0x00a7,0x045c,0x045d,0x045e,0x045f };

相关文档
最新文档