藏语口语语音语料库的设计与研究

2018，54（13）1引言语音语料库在语音处理技术的研究和发展过程中起着基础性的数据支撑作用，基于语音语料库的语音识别技术已经在汉语、英语等大语种语音研究领域取得了巨大成功，是目前发展最快，成果最多，最具实用前景的语音处理技术。藏语是藏族人民普遍使用的地方语言，藏语语音处理技术的发展，可以有效地促进藏区与其他地区之间的语言沟通，增进民族间交流，从而支援藏区经济、科技、文化等领域的发展。相比汉语、英语等大语

种而言，由于使用人数少，地区经济基础薄弱，科教水平落后，面向藏语的语音识别不仅起步较晚，相关研究也滞后很多[1]。尽管基于隐含马尔可夫模型（Hidden Markov Model ，HMM ）和深度神经网络（Deep Neural Network ，DNN ）的语音识别方法在汉语、英语等大语种语音识别中已经取得了显著成效[2]，但在藏语口语语音识别上并没有同样出色的表现，即便采用深度神经网络进行特征藏语口语语音语料库的设计与研究

黄晓辉1，2，李京1，马睿2，3

HUANG Xiaohui 1，2,LI Jing 1,MA Rui 2，3

1.中国科学技术大学计算机科学与技术学院，合肥230026

2.解放军外国语学院工程系，河南洛阳471003

3.中央民族大学藏学研究院，北京100081

1.School of Computer Science and Technology,University of Science and Technology of China,Hefei 230026,China

2.Department of Engineering,PLA University of Foreign Language,Luoyang,Henan 471003,China

3.Institute of Tibetology,Minzu University of China,Beijing 100081,China

HUANG Xiaohui,LI Jing,MA Rui.Design and research of Tibetan spoken speech https://www.360docs.net/doc/f716440112.html,puter Engineering and Applications,2018,54（13）：231-235.

Abstract ：Based on the research and analysis of the construction method of traditional phonological corpus,combined with the related needs of natural spoken speech recognition and the characteristics of Tibetan natural spoken language,the construction scheme and annotation standard of spoken language corpus suitable for Tibetan speech recognition is designed.A 50-hour Tibetan Lhasa spoken corpus with five layers of annotation including phonemes,semitone,syllables,Tibetan word and sentences is also constructed.The statistic characteristics show that this corpus retains the natural properties of spoken language,and also has a balanced coverage of commonly used modeling units such as phonemes,semitone,so it is able to provide reliable data support for speech recognition technology based on Tibetan spoken speech data.

Key words ：speech corpus;spoken speech;speech recognition;annotation standard;Tibetan Lhasa words

摘要：基于对普通语音语料库构建方法的研究与分析，结合自然口语语音识别研究相关需求以及藏语自然口语语音的基本特点，研究设计了适用于藏语语音识别的口语语音语料库建设方案以及相应的标注规范，并据此构建了时长50小时，包含音素、半音节、音节、藏文字以及语句共5层标注信息的藏语拉萨话口语语音语料库。统计结果显示，该语料库在保留口语语音自然属性的同时，对音素、半音节等常用语音建模单元也有均衡的覆盖，为基于藏语口语语音数据的语音识别技术研究提供了可靠的数据支撑。

关键词：语音语料库；口语语音；语音识别；标注规范；藏语拉萨话

文献标志码：A 中图分类号：TP391doi ：10.3778/j.issn.1002-8331.1702-0269

基金项目：国家重点研发计划项目（No.2016YFB0201402）。

作者简介：黄晓辉（1986—），男，博士研究生，讲师，研究领域为深度学习、自然语言处理，E-mail ：huangxia@https://www.360docs.net/doc/f716440112.html, ；

李京（1966—），男，博士，教授，研究领域为大数据处理；马睿（1990—），男，硕士研究生，讲师，研究领域为藏语言文学。收稿日期：2017-02-24修回日期：2017-04-19文章编号：1002-8331（2018）13-0231-05

CNKI 网络出版：2017-07-19,https://www.360docs.net/doc/f716440112.html,/kcms/detail/11.2127.TP.20170719.1059.024.html

Computer Engineering and Applications 计算机工程与应用

231

万方数据