Introduction to CKIP Chinese word segmentation system for the first international Chinese W

合集下载

自我介绍推广中国文化英语作文

自我介绍推广中国文化英语作文

自我介绍推广中国文化英语作文Hello everyone! My name is Li Hua, and I am a native Chinese speaker from Beijing, the capital of China. I am deeply passionate about my country’s rich and diverse culture, as well as sharing it with people across the globe. In this essay, I will be introducing myself and delving into various aspects of Chinese culture, including traditional festivals, cuisine, and art forms.One of the most distinct features of Chinese culture is its many fascinating festivals. The most widely celebrated festival in our country is the Spring Festival, also known as Chinese New Year. This holiday marks the beginning of a new lunar year, which is based on the traditional Chinese calendar. Families gather to spend time together, exchange gifts, and prepare for a prosperous year ahead. Houses are adorned with red lanternsand stunning paper-cut decorations, symbolizing happiness and good fortune.Another popular festival in China is the Mid-Autumn Festival. This event takes place on the 15th day of the eighth lunar month when the full moon is at its roundest and brightest. Families unite once again to enjoy sumptuous feasts, admire the full moon’s beauty, and savor mouth-watering mooncakes as a symbol of unity and harmony.Mentioning China will not be complete without touching on its exquisite variety of culinary delights. Each region in our vast country possesses its unique flavors and ingredients that illustrate its history and geography. Signature dishes include Peking Roast Duck from Beijing, Sichuan’s renow ned Kung Pao Chicken, and Shanghai’s delicate xiaolongbao (soup dumplings).Chinese cuisine does not only rely on taste but also maintains a balance between nutrition, appearance, and presentation. Food is commonly viewed as both sustenance and medicine within our culture. Additionally, the art of tea drinking holds great significance in Chinese society, with many traditional ceremonies showcasing various tea types.Our culture also boasts a wide array of art forms that are deeply entrenched in our history. Calligraphy has been an essential form of visual expression for millennia, with skilled artists displaying intricate character styles through beautiful brushwork. Traditional calligraphy pieces are considered treasures to be passed down through generations.Additionally, classical Chinese painting showcases its connection to nature through landscapes, birds, and flowers. These masterpieces highlight delicate lines combined with subtle color palettes to create breathtaking scenes that reflect our love for natural beauty.Finally, Chinese performing arts have attracted audiences worldwide with their distinctive style and elegance. Both Peking Opera and acrobatics are celebrated examples of our heritage in performance arts. Moreover, traditional music using instruments such as erhu and guzheng blends modern tunes with ancient melodies for a truly enchanting experience.To further promote understanding between cultures around the world, it is crucial to engage in cross-cultural exchanges and showcase our cultural aspects wholeheartedly. As an ambassador for my country’s rich cultural history, I strive to educate others about its traditions while also fostering a sense of unity amongst diverse backgrounds.In closing, I remain exceedingly proud to call China my homeland and to share its wonders with people abroad. From our captivating festivals and mouth-watering cuisine to our time-honored art forms, Chinese culture is brimming with treasures waiting to be explored. I hope my brief introductionhas kindled your curiosity about China and enticed you to delve deeper into our fascinating way of life. Thank you for taking the time to read my essay, and I look forward to promoting cultural understanding and exchange in our increasingly interconnected world.。

向朋友介绍中国传统文化的英语作文

向朋友介绍中国传统文化的英语作文

向朋友介绍中国传统文化的英语作文Introduction to Chinese Traditional CultureChina is a country with a rich history and deep cultural heritage, including a wide variety of traditions that have been passed down for generations. From Confucianism and Taoism to calligraphy and traditional festivals, Chinese culture offers a wealth of insights into the values, beliefs, and practices of the Chinese people. In this essay, I will introduce you to some key aspects of China's traditional culture.One of the most important philosophical traditions in China is Confucianism, which emphasizes the importance of ethics, morality, and human relationships. Founded by the philosopher Confucius in the 6th century BC, Confucianism has had a profound influence on Chinese society and has shaped many aspects of Chinese culture, including education, government, and social customs. Confucian values such as filial piety, respect for elders, and harmony in relationships continue to be central to Chinese life today.Another major philosophical tradition in China is Taoism, which is based on the teachings of the philosopher Laozi. Taoism emphasizes living in harmony with the natural world andembracing simplicity, spontaneity, and tranquility. Taoist practices such as meditation, qigong, and tai chi are still popular in China today and are believed to promote health, longevity, and spiritual well-being.In addition to philosophy, Chinese culture is also known for its rich artistic traditions, including calligraphy, painting, music, and dance. Chinese calligraphy, in particular, is considered a high art form and is highly valued for its beauty, expressiveness, and spiritual significance. Many Chinese people practice calligraphy as a form of meditation and self-expression, using brush, ink, and paper to create intricate and elegant characters that convey deep meaning and emotion.Music and dance are also important elements of Chinese culture, with a long history of traditional instruments, styles, and performances. Chinese classical music, which dates back thousands of years, is known for its melodic richness, rhythmic complexity, and emotional depth. Traditional Chinese dance, meanwhile, is characterized by its grace, precision, and symbolic storytelling, with performers using elaborate costumes, props, and movements to convey themes of love, courage, and heroism.In addition to philosophy and the arts, Chinese culture is also known for its vibrant festivals and celebrations, which bringpeople together to honor the past, present, and future. One of the most important festivals in China is the Lunar New Year, also known as the Spring Festival, which marks the beginning of the Chinese calendar and is a time for family reunions, feasting, and gift-giving. Other major festivals in China include theMid-Autumn Festival, the Dragon Boat Festival, and the Lantern Festival, each of which has its own unique customs, rituals, and traditions.Overall, Chinese traditional culture is a rich and diverse tapestry of beliefs, practices, and expressions that reflect the values, aspirations, and experiences of the Chinese people. By exploring these traditions and learning from them, we can gain a deeper appreciation for the wisdom, beauty, and significance of China's cultural heritage. I hope this introduction to Chinese traditional culture has inspired you to delve further into this fascinating subject and discover the many treasures that await you. Thank you for reading.。

介绍中文的英文作文怎么写

介绍中文的英文作文怎么写

介绍中文的英文作文怎么写下载温馨提示:该文档是我店铺精心编制而成,希望大家下载以后,能够帮助大家解决实际的问题。

文档下载后可定制随意修改,请根据实际需要进行相应的调整和使用,谢谢!并且,本店铺为大家提供各种各样类型的实用资料,如教育随笔、日记赏析、句子摘抄、古诗大全、经典美文、话题作文、工作总结、词语解析、文案摘录、其他资料等等,如想了解不同资料格式和写法,敬请关注!Download tips: This document is carefully compiled by theeditor. I hope that after you download them,they can help yousolve practical problems. The document can be customized andmodified after downloading,please adjust and use it according toactual needs, thank you!In addition, our shop provides you with various types ofpractical materials,such as educational essays, diaryappreciation,sentence excerpts,ancient poems,classic articles,topic composition,work summary,word parsing,copyexcerpts,other materials and so on,want to know different data formats andwriting methods,please pay attention!Writing an English essay about Chinese can be a bit challenging, but it's definitely doable. You just need to make sure that you have a clear idea of what you want to say and how you want to say it. It's important to keep in mind that English and Chinese are very different languages, so you'll need to be careful about how you use them in your essay.When writing about Chinese in English, it's important to remember that you're not just translating words from one language to another. You need to think about the cultural context of the language and how it affects the way people communicate. For example, Chinese is a tonal language, which means that the meaning of a word can change depending on the tone in which it's spoken. This is something that English speakers might not be familiar with, so it's important to explain it clearly in your essay.Another thing to consider when writing about Chinese inEnglish is the different writing systems. Chinese uses characters, while English uses an alphabet. This can makeit difficult for English speakers to understand Chinese writing, so it's important to explain the differences in your essay. You might also want to talk about the history of Chinese writing and how it has evolved over time.One of the most interesting things about Chinese is its rich cultural heritage. Chinese culture is thousands of years old and has had a huge impact on the world. From Confucianism to Taoism to Chinese medicine, there's a lot to explore. You might want to talk about some of these cultural aspects in your essay and explain how they have influenced the language and the way it's spoken.Finally, when writing about Chinese in English, it's important to be sensitive to the fact that you're discussing a language and a culture that is very different from your own. It's important to approach the topic with an open mind and a willingness to learn. You might want totalk about some of the challenges of learning Chinese as a second language, or the ways in which Chinese and Englishspeakers can communicate effectively despite theirlinguistic differences.In conclusion, writing an English essay about Chinese can be a fascinating and rewarding experience. It's important to approach the topic with an open mind and a willingness to learn, and to be sensitive to the differences between the two languages and cultures. By doing so, you can create a thoughtful and thought-provoking essay that sheds light on the richness and complexity of the Chinese language and culture.。

向外国友人介绍中国成语的英语作文

向外国友人介绍中国成语的英语作文

向外国友人介绍中国成语的英语作文My Foreign Friend's Curiosity about Chinese IdiomsHi friend! My name is Xiaoming and I'm 10 years old. I was born in China and live in Beijing with my family. I love learning about the interesting parts of Chinese culture and language. Today I want to tell you about a very special part of the Chinese language - idioms that we call "chengyu".Chengyu are four-character idioms that have a very long history. Many of them come from ancient stories, fables, and classics of Chinese literature. Using chengyu makes your Chinese sound more vivid, expressive, and intellectual. It's a way to show your knowledge of Chinese culture too.One of the earliest chengyu I learned is "一石二鸟" (yi shi er niao) which means "to kill two birds with one stone". It comes from an old story about a clever person who was able to hit two birds by throwing just one rock. We use this chengyu when we can accomplish two things with a single action. For example, my mom uses it when she cooks one dish but is able to use the leftovers for my lunchbox the next day.Another common chengyu is "亡羊补牢" (wang yang bu lao) which means "to mend the pen after losing the sheep". It refersto thinking about precautions too late, after something bad has already happened. My dad likes to use this one when I forget to do my homework and then promise to be more responsible from now on.One of my favorites is "师出无名" (shi chu wu ming) which literally means "humble teachers produce outstanding students". It praises great teachers who help their students shine, without seeking fame for themselves. I use this chengyu to compliment my Chinese teacher, who is very patient and makes learning fun.There's also "一帆风顺" (yi fan feng shun) which means "a sail with a favorable wind" and is used to describe something going smoothly and successfully. My grandpa always says this chengyu whenever I do well on an important test at school.Sometimes chengyu can be quite vivid, like "金蝉脱壳" (jin chan tuo ke) which means "the golden cicada shedding its shell". It's used to refer to someone being transformed, like getting a new job or role. I think my older sister will experience this chengyu after she finishes college!While most chengyu are only four characters long, there are some special longer ones too. "塞翁失马,焉知非福" (sai weng shi ma, yan zhi fei fu) means "The old man from Sai lost his horse; littledid he know it was a blessing in disguise". Thisnine-character chengyu reminds us that apparent misfortunes can actually lead to unexpected good things. My uncle lost his job last year, but he was then able to start his own successful business.As you can see, chengyu are very poetic and full of imagery from ancient tales. I really enjoy using them to make my Chinese more beautiful and meaningful. I'm still learning lots of new chengyu every day though! There are said to be over 20,000 chengyu in total.Some chengyu describe personality traits, like "独树一帜" (du shu yi zhi) which means "a solitary banner against the wind", praising someone with an unconventional spirit. Or "重情重义" (chong qing chong yi) which refers to someone who values ethics and personal loyalties highly.Other chengyu give vivid descriptions, like "锦上添花" (jin shang tian hua) which means "to add flowers to embroidery" and describes making something already beautiful even more exquisite. Or "捭阖同心" (fu tuan tong xin) which depicts two wild geese flying side-by-side harmoniously, symbolizing a unified, cooperative spirit.I find chengyu so fascinating as they pack so much meaning and imagery into just four little characters. Using appropriatechengyu well can make your Chinese sound very cultured and sophisticated.But I think the effort of learning chengyu is worthwhile, as it allows me to better appreciate the richness of the Chinese language and its deep cultural roots. I'm proud of this unique aspect of Chinese.I hope you found this introduction interesting, my foreign friend! Chengyu can seem daunting with their rich meanings and literary sources. But I believe learning chengyu is a fun way to dive deeper into the heart of Chinese language and culture. Let me know if you have any other questions!。

自我介绍推广中国文化英语作文

自我介绍推广中国文化英语作文

自我介绍推广中国文化英语作文Hello everyone, I am very pleased to have theopportunity to introduce Chinese culture to you. My name is [Your Name], and I come from China.中国文化源远流长,有着悠久的历史和丰富的内容。

中国有着五千年的文明史,这其中包括了华夏文明的发展,各个朝代的政治、经济、文化等方面的成就。

同时,中国也是一个多民族的国家,拥有丰富多彩的民族文化。

中国文化融合了传统与现代,包括了文学、艺术、哲学、宗教、饮食、建筑、服饰等各个方面。

Chinese culture is extensive and profound, and it has made significant contributions to the world. From the invention of papermaking, printing, and the compass to the four great inventions of ancient China (gunpowder, papermaking, printing, and the compass), Chinese culturehas greatly influenced the world.中国文化不仅仅是古老的传统,也是一种包容和开放的态度。

中国人民热情好客,喜欢与世界各地的人民交流互动。

中国文化也是多姿多彩的,每一个细节都蕴含着深刻的内涵和意义。

例如,中国的书法艺术、中国的茶文化、中国的传统节日等等都是中国文化的重要组成部分。

In addition, Chinese culture has also had a significant influence on other countries, especially in East Asia. Many countries in this region have adopted Chinese characters, Confucianism, and other aspects of Chinese culture intotheir own societies.中国文化的传播也得益于现代科技的发展,例如中国的网络文化、电影、音乐等,都能够通过互联网和其他渠道传播到世界各地。

介绍汉语80词英语作文

介绍汉语80词英语作文

介绍汉语80词英语作文题目,Introducing 80 Chinese Characters in English。

Introduction。

Chinese characters, as one of the oldest writing systems in the world, possess a profound cultural heritage and carry significant linguistic depth. With over 80,000 characters recorded in historical texts, modern Chinese employs about 8,000 of them in daily usage. Understanding a selection of 80 essential characters offers a gateway into the rich tapestry of Chinese language and culture. In this essay, we will explore the meanings, pronunciations, and cultural connotations of 80 Chinese characters, providing insights into their significance and usage.Basic Concepts。

Before delving into the characters themselves, it's crucial to grasp some fundamental concepts of Chinesewriting. Unlike alphabetic systems where letters represent sounds, Chinese characters are logograms, where each character represents a morpheme or a meaningful unit of language. Additionally, Chinese characters are typically composed of radicals, which are smaller graphical components that hint at the character's meaning or pronunciation.Pronunciation。

向外国友人介绍中国茶文化英语作文

向外国友人介绍中国茶文化英语作文

向外国友人介绍中国茶文化英语作文Introduction to Chinese Tea CultureChina is known for its rich history and diverse culture, and one aspect of this culture that is highly esteemed is the art of tea drinking. Chinese tea culture has a history that spans thousands of years and has been an integral part of Chinese society for generations. In this article, we will explore the history, significance, and rituals of Chinese tea culture and its enduring influence on the Chinese way of life.History of Chinese Tea CultureThe history of Chinese tea culture dates back to ancient times, with the first recorded mention of tea in Chinese literature dating back to the 8th century BC. Legend has it that the Chinese Emperor Shennong, who is also known as the "Divine Farmer," discovered tea when a few tea leaves accidentally fell into a pot of boiling water he was preparing. Intrigued by the aroma and taste of the infused water, he began to investigate the properties of tea and its medicinal benefits.Over the centuries, tea consumption in China evolved from a medicinal drink to a popular beverage enjoyed by people from all walks of life. Tea became an integral part of Chinese socialgatherings, with elaborate tea ceremonies and rituals developed to honor the beverage and its significance in Chinese culture.Significance of Chinese Tea CultureChinese tea culture holds a special place in Chinese society and is deeply intertwined with Chinese values of hospitality, respect, and harmony. Tea is viewed as a symbol of purity and balance, and it is believed to have a calming and meditative effect on the mind and body.In Chinese tradition, tea is not just a beverage but a medium for expression and communication. It is often used as a form of social lubricant, bringing people together to share stories, exchange ideas, and strengthen relationships. Tea ceremonies are a common feature of Chinese weddings, business meetings, and other important events, where the act of serving and drinking tea is seen as a gesture of goodwill and hospitality.Rituals of Chinese Tea CultureChinese tea culture is characterized by a series of rituals and customs that govern the preparation, serving, and drinking of tea. These rituals are based on the principles of mindfulness, respect, and harmony, and they are designed to enhance the sensory experience of tea drinking.One of the most well-known tea rituals in China is the Gongfu tea ceremony, a traditional method of preparing and serving tea that emphasizes precision, skill, and elegance. In a Gongfu tea ceremony, the tea master carefully measures the tea leaves, heats the water to the correct temperature, and performs a series of precise movements to brew the tea to perfection. The tea is then poured into small cups and served to guests, who are expected to savor the aroma, flavor, and texture of the tea in silence.Another important aspect of Chinese tea culture is the concept of tea appreciation, which involves the study and appreciation of different types of tea, as well as the aesthetics of tea ware and the art of tea making. Chinese tea connoisseurs often spend years learning about tea varieties, tea production methods, and tea tasting techniques, in order to develop a deep understanding and appreciation of this ancient beverage.ConclusionChinese tea culture is a vibrant and enduring tradition that continues to play a significant role in Chinese society today. The history, significance, and rituals of Chinese tea culture reflect the values and beliefs of Chinese people, and they offer a unique insight into the art of tea drinking in China. Whether you are atea enthusiast or a novice in the world of tea, exploring Chinese tea culture can be a rewarding and enlightening experience that will deepen your appreciation of this ancient and cherished beverage.。

介绍汉语英语作文

介绍汉语英语作文

介绍汉语英语作文Title: An Introduction to Chinese-English Composition。

Writing essays in both Chinese and English presents a unique challenge, requiring proficiency in two languages and an understanding of cultural nuances. In this introduction, we'll explore the characteristics, differences, and strategies for composing essays in both languages.Firstly, let's acknowledge the fundamental differences between Chinese and English composition. Chinese writing is characterized by its use of characters, each representing a morpheme, word, or concept. On the other hand, English relies on an alphabet system, with each letter forming the basis of words and sentences. This structural contrast influences not only the writing process but also the way ideas are conveyed.One significant aspect is sentence structure. Chinesesentences often follow a subject-verb-object (SVO) order, while English typically adheres to a subject-verb-object (SVO) structure. Understanding these differences is crucial for effectively conveying ideas and maintaining coherence in writing.Another key difference lies in the use of idioms, expressions, and cultural references. Chinese essays may incorporate classical idioms and allusions, adding depth and richness to the text. In contrast, English essays may rely on idiomatic expressions and cultural references rooted in Western literature or popular culture. Balancing cultural elements while ensuring clarity and accessibility to the target audience is essential in bilingual composition.Additionally, tone and style vary between the two languages. Chinese writing often emphasizes harmony, modesty, and implicitness, reflecting traditional cultural values. English writing, meanwhile, may prioritize clarity, precision, and directness. Adapting tone and style to suit the language and context is vital for effectivecommunication.Now, let's discuss strategies for composing Chinese-English essays. Firstly, it's essential to have a strong command of vocabulary and grammar in both languages. This enables seamless expression and facilitates effective translation between Chinese and English.Secondly, understanding the cultural context of both languages is crucial. Familiarity with cultural norms, customs, and literary traditions enhances the authenticity and depth of the writing.Moreover, practicing bilingual writing regularly can improve proficiency and fluency. Engaging in activities such as translation exercises, language exchange, and reading bilingual texts can sharpen language skills and expand vocabulary.Furthermore, seeking feedback from native speakers and language professionals can provide valuable insights and help identify areas for improvement.In conclusion, writing essays in both Chinese and English requires linguistic proficiency, cultural awareness, and effective communication skills. By understanding the characteristics, differences, and strategies for bilingual composition, writers can enhance their ability to convey ideas accurately and compellingly in both languages.。

  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

Introduction to CKIP Chinese Word Segmentation System for the First International Chinese Word Segmentation BakeoffWei-Yun MaInstitute of Information science, Academia Sinicama@.twKeh-Jiann Chen Institute of Information science, Academia Sinicakchen@.twAbstractIn this paper, we roughly described theprocedures of our segmentation system,including the methods for resolving seg-mentation ambiguities and identifying un-known words. The CKIP group ofAcademia Sinica participated in testing onopen and closed tracks of Beijing Univer-sity (PK) and Hong Kong Cityu (HK).The evaluation results show our systemperforms very well in either HK opentrack or HK closed track and just accept-able in PK tracks. Some explanations andanalysis are presented in this paper.1IntroductionAt the first international Chinese Word Segmentation Bakeoff, Academia Sinica participated in testing on open and closed tracks of Beijing University (PK) and Hong Kong Cityu (HK). The same segmentation algorithm was applied to process these two corpora, except that character code conversion from GB to BIG5 for PK corpus and few modifications due to different segmentation standards had been made. The difference between open and closed tracks is that while processing the open track, besides of the lexicon trained from the specific corpus, we also consulted the Academia Sinica lexicon to enhance the word collection.It is well known that there are two major difficulties in Chinese word segmentation. One is resolving the ambiguous segmentation, and the other is identifying unknown words.Our earlier work mainly focused on the resolving of segmentation ambiguities and using regular expressions to handle the determinant-measure and reduplication compounds (Chen & Liu 1992, Chen 1999). We adopt a variation of the longest matching algorithm with several heuristic rules to resolve the ambiguities and achieve 99.77% of the success rate without counting the mistakes occurred due to the existence of unknown words. After that, we were paying more attention on the problems of extracting and identifying unknown words (Chen et.al 1997, Chen & Bai 1998, Chen & Ma 2002, Tseng & Chen 2002, Ma & Chen 2003). The process of unknown word extraction could be roughly divided into two steps, i.e. detection process and extraction process. The detection process detects possible occurrences of unknown words (Chen & Bai 1998), so that deeper morphological analysis is carried out only at the places where unknown word morphemes were detected (Chen & Ma 2002). A bottom-up merging algorithm was proposed in (Ma & Chen 2003), which utilizes hybrid statistical and linguistic information to extract unknown words effectively. In addition to the bakeoff results evaluated by SIGHAN, we also present some other relevant experiment results and provide analysis on the system performance in the following sections.2System OverviewFigure 1 illustrates the block diagram of our segmentation system used in this contest. The first two steps of word segmentation algorithm are word matching and resolution for ambiguous matches. These two processes were performed inparallel. The algorithm reads the input sentences from left to right and matches the input character string with lexemes. In (Chen & Liu 1992), if an ambiguous segmentation does occur, the matching algorithm looks ahead two more words, and the disambiguation rules for those three word chunks is applied afterward. For instance, in (1), the first matched word could be '' or ''. Then the algorithm will look ahead to take all of the possible combinations of three word chunks, as shown in (2), into consideration.Figure 1. Flowchart of the system(1)complete authenticate report"complete the report about authenticating"(2)The disambiguation algorithm will select the first word of the most plausible chunks as the solution according to heuristic rules. The first heuristic rule is:Longest Matching Rule: The most plausible seg-mentation is the three word sequence with the longest length.In the above example, the longest matched three-word chunk is (1). Therefore the first seg-mented word is ''. This heuristic rules achieves as high as 99.69% accuracy and a high applicability of 93.21%, i.e. the 93.21% of the am-biguities were resolved by this rule. However there are still about 6.79% of ambiguities, i.e. the three word chunks with the same length but with differ-ent segmentations, which cannot be resolved by the maximal matching rule. The following heuristic rules were used for further resolution.Word Length Rule: Pick the three-word chunk that has the smallest standard deviation in length of the three words.Morphemic Rules:(a). Pick the chunk with fewer bound morphemes. (b). Pick the chunk with fewer characters in com-pound words. Probability Rule:(a). Pick the chunk with the high frequency mono-syllabic words.(b). Pick the chunk with the highest probabilityvalue.After disambiguation process, an input sentence is segmented into a word sequence. Then for the needs of the following unknown word extraction, a Pos bi-gram tagging model is applied to tag Pos of words.It is clear that unknown words in the input text will be segmented into several adjacent tokens (known words or monosyllabic morphemes). Then at unknown word detection stage, every monosyllable is decided whether it is a word or an unknown word morpheme by a set of syntactic discriminators, which are trained from a word segmented corpus.(3)if can increase gross profit rate"if gross profit rate can be increased…"(4) after first step word segmentation:after unknown word detection:(?) (?) (?)after unknown word extraction:For example, the correct segmentation of (3) is shown, but the unknown word ”” is segmented into three monosyllabic words after the first step of word segmentation process. In (4), The unknown word detection process will mark the sentence as “() () () (?) (?) (?)”, where (?) denotes the detected monosyllabic unknown word morpheme and () denotes common words. During extracting process, the rule matching process focuses on the morphemes marked with (?) only and tries to combine them with left/right neighbors according to the rules for unknown words. After that the unknown word “” is extracted.We adopt a bottom-up merging algorithm (Ma & Chen 2003), which utilizes hybrid statistical and linguistic information, to extract unknown words. 3Adaptation for Different TracksIt is known that different segmentation standards could affect the performance of segmentation significantly. In this contest, due to limited preparing time, we mainly focused on adjusting the regular expressions for determinant-measure compounds according to the HK and PK segmentation standards.While processing the PK track, a shortcut method of converting GB codes to BIG5 codes was adopted to cope with the problem of character coding difference. Instead of re-design or re-implement the GB segmentation system, we convert the codes of training and testing PK corpora into BIG5 versions and perform the segmentation under the BIG5 environment. The segmented results are then translated back to GB code as the final outputs. In contrast, processing of HK corpus is easier for us, because our system was designed for the BIG5 environment.As for the lexicons, for closed test, both PK and HK lexicons are derived from the word sets of each respective training corpus. For the open test, each lexicon was enhanced by adding the lexical entries in the CKIP lexicon. The sizes of lexicons are shown in table1.HK PK # of lexical entries (HK/PK)forclosed test22K 50K# of lexical entries (HK/PK joinCKIP) for open test140K 156K Note: # lexicon of (CKIP) is 133KTable 1. The sizes of lexicons Syntactic categories of a words were utilized in unknown word detection and extraction processes. We don’t have syntactic categories for words which are not in the CKIP lexicon. Therefore, we (Chen et.al 1997, Tseng & Chen 2002) use association strength between morphemes and syntactic categories to predict the category of a new word. The accuracy rate is about 80%.4Evaluation ResultsThere are several evaluation indexes provided by SIGHAN, i.e. test recall (R), test precision (P), F score2, the out-of-vocabulary (OOV) rate for the test corpus, the recall on OOV words (R oov), and the recall on in-vocabulary (R iv) words.Tables 2 shows the evaluation results of our sys-tem in HK closed and open tracks. For both tracks, our system achieved the top ranks on F scores.R P F OOV R oov R iv Closed 0.947 0.934 0.940 0.071 0.625 0.972 Open 0.958 0.954 0.956 0.071 0.788 0.971 Note: The word count of testing corpus is 34955Table 2. Scores for HKThe evaluations of our system in PK closed and open tracks are shown in table 3. For PK closed track, our system ranks 6th among 10 systems. And for PK open track, our system ranks 3rd among 8 systems.R P F OOV R oov R iv Closed 0.939 0.934 0.936 0.069 0.642 0.961 Open 0.939 0.938 0.938 0.069 0.675 0.959Note: The word count of testing corpus is 34955Table 3. Scores for PKBecause Academia Sinica corpora (AS) are provided by us, we are not allowed to participate any AS track at this contest. Therefore, in this report, we still show the performance of our system evaluating AS closed track in table 4. Our system would have the top rank if the result was compared with the other 6 participants of AS closed track.R P F OOV R oov R iv0.968 0.966 0.967 0.022 0.657 0.975Note: The word count of testing corpus is 11985Table 4. Scores for AS closed5Discussions and ConclusionsThe evaluation results show that our system performs very well in either HK closed track or HK open track. We think the key to the success is our unknown word extraction performs better than other participants. This could be observed by the results of HK closed track, the 2th and 3th system, which have better performance in R iv but worse R oov than our system, performs worse than our system in f score. Furthermore to have better performance, high precision for unknown word extraction is necessary, since one identification error may cause at least two segmentation errors. The performance in PK tracks are not as well as HK. An important reason is that coding conversion may cause errors. For instance, in the conversion of the GB code of “” (the capital of Brazil) to its BIG5 codes, Since GB code to BIG5 conversion is a one-to-many mapping, the above example is wrongly converted to “”. This kind of errors do affect accuracy of thesegmentation significantly, especially for the unkown word processes. To solve this problem, we think the best and direct solution is to re-implement the GB segmentation version without any code conversion.Variation on the word segmentation standards is another reason of causing segmentation errors. Some of the standards were even not available to the public. It is better to propose a uniform word segmentation standard in the future.Regarding evaluation index, we suggest that an error type of crossing error should be take into consideration, since noncrossing errors are more or less related to segmentation standards and crossing errors are more severe.6References[1] Chen, K.J. & S.H. Liu, 1992,"Word Identificationfor Mandarin Chinese Sentences," Proceedings of 14th Coling, pp. 101-107[2] Chen, C. J., M.H. Bai, & K.J. Chen, 1997,” Cate-gory Guessing for Chinese Unknown Words,” Pro-ceedings of the Natural Language Processing Pacific Rim Symposium, 35-40, Thailand.[3] Chen, K.J. & Ming-Hong Bai, 1998, “UnknownWord Detection for Chinese by a Corpus-based Learning Method,” international Journal of Com-putational linguistics and Chinese Language Processing, Vol.3, #1, pp.27-44[4] Chen, Keh-jiann,1999,”Lexical Analysis for Chi-nese- Difficulties and Possible Solutions”, Journal of Chinese Institute of Engineers, Vol. 22. #5, pp.561-571.[5] Chen, K.J. & Wei-Yun Ma, 2002. Unknown WordExtraction for Chinese Documents. In Proceedings of COLING 2002, pages 169-175[6] Tseng, H.H. & K.J. Chen, 2002. Design of ChineseMorphological Analyzer. In Proceedings of SIGHAN, pages 49-55[7] Ma Wei-Yun & K.J. Chen, 2003. A bottom-upMerging Algorithm for Chinese Unknown Word Extraction. In Proceedings of SIGHAN。

相关文档
最新文档