Multilingual Search for Cultural Heritage Archives via Combining Multiple Translation Resou

合集下载

跨文化视角下《故乡》中文化负载词的俄译

跨文化视角下《故乡》中文化负载词的俄译

Modern Linguistics 现代语言学, 2023, 11(9), 3797-3803 Published Online September 2023 in Hans. https:///journal/ml https:///10.12677/ml.2023.119510跨文化视角下《故乡》中文化负载词的俄译 董文璇山东师范大学外国语学院,山东 济南收稿日期:2023年7月19日;录用日期:2023年8月22日;发布日期:2023年9月1日摘要 本文从跨文化视角出发,对《故乡》两个俄译本中的文化负载词进行了分析。

首先将文化负载词分为物质文化负载词、社会文化负载词、生态文化负载词、宗教文化负载词和语言文化负载词五类,然后结合具体词汇从概念意义和文化内涵两方面进行了分析,最后通过《故乡》的两个俄译本分析了造成中俄之间文化差异的外部因素和内部因素,以及《故乡》文化负载词翻译对跨文化交际的启示。

关键词跨文化交际,文化负载词,影响因素,启示The Russian Translation of Culturally Loaded Words in “Hometown” from a Cross-Cultural PerspectiveWenxuan DongSchool of foreign languages, Shandong Normal University, Jinan Shandong Received: Jul. 19th , 2023; accepted: Aug. 22nd , 2023; published: Sep. 1st , 2023AbstractThis article analyzes the culturally loaded words in the two Russian translations of “Hometown” from a cross-cultural perspective. First, culture loaded words are divided into five categories: ma-terial culture loaded words, social culture loaded words, ecological culture loaded words, reli-gious culture loaded words, and language culture loaded words. Then, combining specific words, it analyzes the conceptual meaning and cultural connotation. Finally, it analyzes the external and internal factors that cause cultural differences between China and Russia through the two Russian versions of Hometown, and the inspiration for cross-cultural communication from the translation董文璇of culturally loaded words in “Hometown”.KeywordsCross-Cultural Communication, Cultural Loaded Words, Influencing Factors, EnlightenmentCopyright © 2023 by author(s) and Hans Publishers Inc.This work is licensed under the Creative Commons Attribution International License (CC BY 4.0)./licenses/by/4.0/1. 引言近年来,随着全球化的发展,各国间文化交流日益频繁,世界各民族文化呈现出相互交融、相互包容的趋势,中华文化实现真正意义上地走出去这一目标也更加迫切。

双语词典文化特有词释义论文

双语词典文化特有词释义论文

双语词典文化特有词释义摘要:双语词典是语言、文化和知识传播的桥梁。

双语词典中文化特有词的释义对于语言学习和跨文化交际起着至关重要的作用。

对文化特有词采用灵活有效的释义方法将有利于词典用户更好地学习语言和理解源语文化。

本文从文化特有词的义项特点出发,探讨双语词典中文化特有词可行的释义方法。

关键词:双语词典文化特有词释义方法义项一、引言词典是语言与文化的载体,也是人类语言、知识和文化得以传承的工具。

因此,双语词典是跨文化交际的桥梁。

在跨文化交际日趋频繁的今天,双语词典中文化特有词的释义则是文化问题在词典中最直接的体现。

兹古斯塔在《词典学概论》中指出“即便对于普通的门外汉而言,词典释义中最为显眼的是文化特有词的释义”。

[1]二、文化特有词的义项分类语言与文化是一对连体婴,不同文化之间的差异则是通过语言这一载体来体现,因此,文化特有词的义项有其深厚的文化渊源。

从文化特有词义项所涉及的范围来看,文化特有词的义项可具体细分为以下3类:(1)涉及某种民族文化的义项。

这种义项可以直接从语词的指称意义中领悟,比,jacket, apple pie 让人联想到美国文化。

(2)涉及不同文化关联的义项。

言语社区在特定的自然环境和社会环境中使用语言,某些义项也随之沉淀,变成隐含的文化特有的义项,比如,white hat在汉语里有一关联义项表示“好人”,因为在美国西部好人经常带白色的帽子。

(3)涉及不同文化观念(notion)的义项。

用汉语的“杂货店”和英语的drugstore(杂货店)作为例子:汉语的“杂货店”指的是出售瓷器、炊具、烟花等日用品的小商店,而英语的“杂货店”则是指兼营杂货、化妆品、酒水的药房。

由此可见,对于“杂货”这一概念,英汉两种语言有着不同的概念。

[2]三、双语词典中文化特有词的释义方法鉴于文化特有词义项独特的文化性,在用目的语对源语词目词释义的过程中,释义既要体现源语所传载的文化信息,又要便于目的语用户的理解。

搜索传统文化的作文

搜索传统文化的作文

搜索传统文化的作文英文回答:Exploring the Profound Depths of Traditional Cultures.Throughout human history, cultures have emerged as intricate and unique expressions of a people's shared beliefs, customs, and practices. These cultural traditions serve as a vital thread that connects past, present, and future generations, preserving the collective wisdom and values of a society. Embarking on a journey to uncover the richness and diversity of traditional cultures is an endeavor that enriches our understanding of humanity and fosters a deep appreciation for the tapestry of human experience.One of the most profound aspects of traditional cultures is their reverence for the natural world. Indigenous communities, for instance, have developed intricate relationships with their surroundings, fosteringa deep understanding of ecological principles. They often possess a holistic perspective that encompasses the interconnectedness of all living things, recognizing the sacredness of the environment. Traditional practices, such as storytelling, rituals, and ceremonies, often reflectthis harmonious relationship with nature, serving to pass down knowledge and conservation practices to younger generations.Another defining characteristic of traditional cultures is their emphasis on community and kinship. Strong social bonds form the bedrock of indigenous societies, shaping individuals' identities and sense of belonging. Kinship systems often extend beyond biological relations, encompassing extended family, clans, and the entire community. Traditional societies also often prioritize reciprocal relationships, promoting a sense of responsibility and interdependence among members. This communal orientation fosters a supportive and resilient social fabric, providing a safety net for individuals amidst life's challenges.Traditional cultures are also repositories of invaluable knowledge and skills. Indigenous communities have accumulated a vast body of practical knowledge through generations of experience, ranging from medicinal practices to sustainable farming techniques. This knowledge, embedded in the collective memory of the community, embodies the collective wisdom of the ancestors and provides a practical foundation for everyday life. Preserving and transmitting this knowledge is essential for maintaining cultural continuity and ensuring the well-being of future generations.Moreover, traditional cultures offer a rich tapestry of artistic expression. Music, dance, visual arts, and oral literature are integral components of many indigenous communities. These art forms provide not only entertainment but also serve as powerful vehicles for storytelling, education, and cultural transmission. Through their unique rhythms, melodies, and visual aesthetics, traditional arts convey the values, beliefs, and experiences of the people who create them. They offer a lens through which we can glimpse the complexities of human history and imagination.中文回答:探索传统文化的深厚根源。

英汉双语词典提供词源信息的途径及优化设置

英汉双语词典提供词源信息的途径及优化设置
是外 语教 师 的主 要教 学工 具 . 是 学 生学 习外 语 的 也
语词 典 。
显然 . — A B型 外 向 双语词 典 应 尽量 多 收 录 以 A
语 言 为母 语 的民族 特定 文化 词语 : — 型 内 向型双 BA 语 词 典应 尽 量 多 收 录 以 B语 言为 母 语 的 民族 特定 文化 词语 : — / — A B B A型双 向双语 词 典 应两 者 兼 顾 。
义项 是词条 中最小 的释 义单位 .是对 多义词 所
会的交际的需求。 黄建华、 陈楚祥指出:语言既是历 “
史 的 。 是现 实 的 , 也 它随着 时代 的发展 而 发展 。 随着 文化 的变 化而 变化 ….”2 给 出对应 词 , 。 [仅 ] 而不 能 真
正 解 释词 语在 这个 民族所 代 表 的文 化含 义 . 往会 往 导致使 用 者 在跨 文化 交 际 中产生 “ 文化 错误 ” 这 样 .
王 丽君
( 福建 师范 大 学 外 国语 学 院 , 建 福 州 3 00 ) 福 50 7
摘 要 : 源 信 息 对 英 语 学 习者起 着很 重要 的作 用 , 词 不仅 可 以 帮助 记 忆 词 汇还 能 了解 词 的 演 变 、 史 等 。 历 因此 , 双 语 词 典 提 供 词 源 信 息是 必 须 的 。 了让 读 者 更 方便 地 使 用 双 语 词 典 , 必 要 对 词 典 一 些 不足 的 地 方 进 行 优 化 设 置 。 为 有 本 文 筛 选 了需 要添 加 词 源 信 息 的词 汇类 型 并 阐述 了原 因。 关 键 词 : 语 词典 ; 化 ; 源信 息 双 文 词
中图 分 类号 : 1 H36 文 献标 识 码 : A 文章 编 号 :0 8 6 9 (0 20 — 0 2 0 10 — 3 02 1 )2 0 7 —4

精选跨国人力资源管理第八章跨文化冲突与管理

精选跨国人力资源管理第八章跨文化冲突与管理
文化移植策略
以母国文化作为 子公司主体文化 的基础上,把开 发国或东道国的 文化嫁植到母国 文化中
文化嫁植策略
跨文化冲突管理的对策
跨文化管理: 指管理者在不同的文化里,有效地 协同不同文化对组织行为的影响,有效地与来自 不同国家和文化背景的人进行良好的沟通。
美国管理学家德鲁克认为,国际企业的经营管理 “基本上就是一个把政治上、文化上的多样性结 合起来而进行统一管理的问题。”
第八章 跨文化冲突与管理
教学内容:跨文化冲突概述、跨文化管理对策 教学时数:4 学时
教学重点:跨文化冲突的定义与特征 教学难点:跨文化管理对策
教学方法:课堂讲授、案例讨论
2024/7/1
第八章 跨文化冲突与管理
第1节 跨文化冲突概述 第2节 跨文化管理对策
第1节 跨文化冲突概述
跨国企业面临的文化差 异问题 跨文化差异即文化差异, 最主要是母国与东道国 之间的文化距离。
• “What is different is dangerous”
Dimension 5: Confucian Dynamism Long-term vs. Short-term Outlook
长期观念与短期观念
• Based on teaching of Confucius • Practical Ethics • No religious content • Emphasis is on what one does, not
在跨国公司中,来自于不同文化背景的员工之 间存在差异,在跨国公司的人力资源管理中, 民族文化仍然占据较重要的地位。
不同民族文化的组织行为存在着差异;这些差 异在解释员工的态度和行为差异方面起着非常 重要的作用;尽管文化之间的沟通大幅度增加, 但一个人的态度和行为仍然是由其国家独特的 传统和习俗塑造的。

高中必修二英语知识点总结

高中必修二英语知识点总结

高中必修二英语知识点总结高中必修二英语知识点Unit1Culturalrelics一、知识点culturalrelics遗产Manyunearthedculturalrelicswereexhibitedatthemuseum.博物馆展出了许多出土文物。

Bydefinitionthecapitalisthepoliticalandculturalcenterofacountry.根据定义,首都是一个国家的政治文化中心。

rareandvaluable珍贵稀有Itisraretofindsuchageniusnowadays.这样的天才现在很少见。

Theflawinthisvasemakesitlessvaluable.这个花瓶因为有点缺陷,不那么值钱了。

insearchof寻找,寻求=insearchforHe'ssailedthesevenseasinsearchofadventure.他闯荡七大洋去历险.Hewenttothesouthinsearchforabetterfuture.他为了寻找更好的前途到南方去。

inthefancystyle以别致的风格in…style/inthestyleof……以……风格Theseclothesaretoofancyforme,Ipreferplainerones.这些衣服对我来说有些花哨,我还是喜欢素净些的。

popularSheispopularatschool.她在学校里很受人喜欢。

Thisdanceispopularwithyoungpeople.这种舞很受青年人喜爱。

…atreasuredecoratedwithgoldandjewels,whichtookthecountry’sbestartistsaboutten yearstomake.用金银珠宝装饰起来的珍品,一批国家最优秀的艺术家用了大约十年的时间才把它完成。

decoratewith以...装饰bedesignedfor…为……而设计bydesign故意地Mybrotherdesignstobeanengineer.我弟弟立志要当工程师。

于对区爱美学校高考英语复习 综合能力检测 4

于对区爱美学校高考英语复习 综合能力检测 4

于对区爱美学校广东惠州一中高考复习必修四综合能力检测Ⅰ语言知识及应用(共两节,满分45分)第一节完形填空(共15小题;每小题2分,满分30分)阅读下面短文,掌握其大意,然后从1-15各题所给的A、B、C和D项中,选出最佳选项。

“When a customer enters my store, forget me.He is King,” said John Wanamaker.This revolutionary concept __1__ the face of retailing (零售业) and led to the deve­lopment of advertising and marketing as we know it today.But convincing as that slogan was, in truth the shopper was cheated out of the crown.Although manufacturing efficiency increased the __2__ of goods and lowered prices, people still relied on advertisements to get most __3__ about products.Through much of the past century, ads spoke to an audience restricted to just a few radio or television channels or a __4__ number of publications.Now media choice has __5__ too, and consumers select what they want from a __6__ greater variety of sources—especially with a few__7__ of a computer mouse.Thanks to the Internet, the consumer is finally __8__ power.As our survey shows, customer __9__ has great implications for companies, because it is changing the way the world shops.Many firms already claim to be “customer­__10__” or “consumer­centered”.Now their __11__ will be tested as never before.Taking advantage of shoppers' __12__ will no longer be possible: people will tell others, even those without the Internet, that prices in the next town are __13__ or that certain goods are inferior.The Internet is working wonders in__14__ standards.Good and __15__ firms should benefit most.1.A.changed B.maintainedC.restored D.rescued2.A.quality B.varietyC.weight D.price3.A.bargain B.certificateC.change D.information4.A.limited B.minimumC.sufficient D.great5.A.disappeared B.existedC.exploded D.survived6.A.quite B.littleC.far D.very7.A.clicks B.typistsC.changes D.designs8.A.losing B.catchingC.controlling D.seizing9.A.power B.qualityC.package D.quantity10.A.driven B.criticizedC.helped D.chased11.rmation B.investmentC.claims D.shops12.A.generosity B.knowledgeC.curiosity D.ignorance13.A.higher B.unreasonableC.unfair D.cheaper14.A.raising B.loweringC.abandoning D.carrying15.A.nice B.honestC.new D.old第二节语法填空(共10小题;每小题1.5分,满分15分)阅读下面短文,按照句子结构的语法性和上下文连贯的要求,在空格处填入一个适当的词或使用括号中词语的正确形式填空。

中华文化探源英文版第一章

中华文化探源英文版第一章

中华文化探源英文版第一章In the vast and profound landscape of human history, the cultural heritage of China stands as a unique and enduring monument. Spanning thousands of years, its legacy is a testament to the resilience and adaptability of the Chinese people, who have preserved and nurtured a rich tapestry of traditions, beliefs, and arts. Unveiling the Origins of Chinese Culture: A Journey through Antiquity aims to delve into the depths of this ancient civilization, tracing its roots and exploring the elements that have shaped its distinct identity.The journey begins with the prehistoric era, when the earliest traces of human settlement in China can be found. The Neolithic Revolution, marked by the development of agriculture and the domestication of plants and animals, laid the foundation for the subsequent growth of civilization. This period is also notable for the emergence of complex social structures and the development of early crafts and ceramics.As we delve deeper into history, the influence of mythology and religion becomes increasingly apparent.Legends such as the creation myths of Pangu and the Divine Emperors provide a window into the ancient worldview and the values that guided early Chinese society. The evolution of Confucianism, Taoism, and Buddhism, and theirintegration into daily life, is another crucial aspect of Chinese cultural identity.The artistic expressions of China's ancient past are equally fascinating. Calligraphy, painting, and pottery are just a few examples of the art forms that have thrived over the centuries, each reflecting the aesthetic sensibilities and cultural values of their respective eras. The Great Wall, the Terracotta Army, and the Forbidden City are monumental examples of ancient Chinese architecture, embodying not only the technological prowess of their builders but also the political and social aspirations of their times.The influence of Chinese culture extends beyond its borders, having had a profound impact on the civilizations of East Asia and beyond. The Silk Road, a network of trade routes that connected China with the rest of the world,facilitated the exchange of goods, ideas, and cultures,thus contributing to the rich tapestry of global history.In conclusion, Unveiling the Origins of Chinese Culture: A Journey through Antiquity offers a comprehensive overview of the diverse and dynamic heritage of Chinese civilization. It is a testament to the resilience and adaptability of the Chinese people, who have preserved and nurtured their cultural traditions for generations. As we delve into the depths of this ancient civilization, we are reminded of the enduring power of culture and the important role it playsin shaping our world.**探寻中华文化之源:穿越古老的旅程**在人类历史的广袤而深邃的画卷中,中华文化的遗产独树一帜,历久弥新。

  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

Multilingual Search for Cultural Heritage Archives via Combining MultipleTranslation ResourcesGareth J.F.Jones,Ying Zhang,Eamonn Newman,Fabio Fantino Centre for Digital Video ProcessingDublin City UniversityDublin9,Ireland{gjones,yzhang,enewman,ffantino}@computing.dcu.ieFranca DeboleISTI-CNRPisaItaly franca.debole @r.itAbstractThe linguistic features of material in Cul-tural Heritage(CH)archives may be in var-ious languages requiring a facility for ef-fective multilingual search.The specialisedlanguage often associated with CH contentintroduces problems for automatic transla-tion to support search applications.TheMultiMatch project is focused on enablingusers to interact with CH content acrossdifferent media types and languages.Wepresent results from a MultiMatch study ex-ploring various translation techniques forthe CH domain.Our experiments ex-amine translation techniques for the En-glish language CLEF2006Cross-LanguageSpeech Retrieval(CL-SR)task using Span-ish,French and German queries.Re-sults compare effectiveness of our querytranslation against a monolingual baselineand show improvement when combining adomain-specific translation lexicon with astandard machine translation system.1IntroductionOnline Cultural Heritage(CH)content is being pro-duced in many countries by organisations such as national libraries,museums,galleries and audiovi-sual archives.Additionally,there are increasing amounts of CH relevant content available more gen-erally on the World Wide Web.While some of this material concerns national or regional content only of local interest,much material relates to items in-volving multiple nations and languages,for exam-ple concerning events in Europe or Asia.In order to gain a full understanding of such events,including details contained in different collections and explor-ing different cultural perspectives requires effective multilingual search technologies.Facilitating search of this type requires translation tools to cross the lan-guage barrier between users and the available infor-mation sources.CH content encompasses various different media, including of course text documents,images,videos, and audio recordings.Search of text documents be-tween languages forms the focus of cross-language information retrieval(CLIR)research,while search for images is the concern of content-based image re-trieval.However,whatever the media of the items they are accompanied by metadata.Such metadata may include simple factual details such as date of creation,but also descriptive details relating to the contents of the item.Multilingual searching using metadata content requires that either the metadata be translated into a language with which the user is able to search or that the search query be translated into the language of the metadata.This alternative of document or query translation is a well rehearsed argument in CLIR,which has generally concerned itself with full text document searching.However, the features of metadata require a more careful anal-ysis.Metadata is typically dense in search terms, while lacking the linguistic structure and informa-tion redundancy of full text documents.The absence of linguistic structure makes precise translation of content problematic,while the lack of redundancy means that accurate translation of individual wordsand phrases is vital to minimise mismatch between query and document terms.Furthermore,CH con-tent is typically in specialised domains requiring do-main specific resources for accurate translation.De-veloping reliable and robust approaches to transla-tion for metadata search is thus an important com-ponent of search for many CH archives.The EU FP6MultiMatch1project is concerned with information access for multimedia and multi-lingual content for a range of European languages. In the investigation reported in this paper we intro-duce thefirst stage multilingual search functional-ity of the MultiMatch system,and describe its use in an investigation for multilingual metadata search. Since at present we do not have a search test collec-tion specifically developed for MultiMatch we use data from the CLEF2006Cross-Language Speech Retrieval(CL-SR)task for our experiments(Oard et al.,2006).The remainder of this paper is organised as fol-lows:Section2gives an overview of the MultiMatch search architecture,Section3outlines the experi-mental search task,Section4describes the trans-lation resources used for this study,Section5and 6concern our experimental setup and results,and finally Section7summarises our conclusions and gives details of our ongoing work.2MultiMatch Search SystemThe MultiMatch search system is centered on the MILOS Multimedia Repository system(Amato et al.,2004)which incorporates free-text search using Lucene(Hatcher and Gospodnetic,2004)and im-age search using an open source image retrieval sys-tem GIFT(M¨u ller et al.,2001).In order to support multilingual searching a number of translation tools are being developed based on standard online ma-chine translation tools and dictionaries augmented with domain-specific resources gathered from the WWW and elsewhere.In this section we briefly in-troduce the relevant details of MILOS and Lucene. Since this paper focuses on text search within Mul-tiMatch,we do not describe the multimedia features of the MultiMatch system.and to deal with heterogeneous metadata,without any constraint on schema design and/or overhead due to metadata translation.Thus,the native XML database/repository system is simpler than a general purpose XML database system,but offers signif-icant improvements in specific areas:it supports standard XML query languages such as XPath and XQuery,and offers advanced search and indexing functionality on arbitrary XML documents.It supports high performance search and retrieval on heavily structured XML documents,relying on specific index structures.Moreover XMLSE provides the possibility of us-ing particular indexes.For example,using the con-figurationfile of XMLSE the system administrator can associate the<abstract>elements of a doc-ument with a full-text index and to the MPEG-7 <VisualDescriptor>elements can be associated with a similarity search index.XMLSE uses Apache Lucene2to provide partial(or approximate)text string matching,effectively providing information retrieval functionality within MILOS.This allows XMLSE to use the ranked searching and wildcard queries of Lucene to solve queries like“find all the articles whose title contains the word XML”and so on.This application allows users to interrogate the dataset combining full text,and exact or partial match search.For example the user can look for documents whose<metadata>element contains the word“Switzerland”.MILOS generates and submits to XMLSE the following XQuery query:for$a in/document where$a//metadata˜’Switzerland’return<result>{$a//title},{$a//author}</result>The query will return a list of results which con-sist of the title and author of all documents whose metadata contains the term“Switzerland”.2.2LuceneFull text search in MILOS is provided by using Lucene as a plugin.Ranked retrieval uses the standard tf×id f vector-space method provided in Lucene(Hatcher and Gospodnetic,2004).Lucene also provides additional functionality to improve re-tion.Full details of the this collection can be foundin(Oard et al.,2006;White et al.,2005).To explore metadatafield search,we used variousmethods,described in the next section,to automati-cally translate the French,German,and Spanish top-ics into English3.4Translation TechniquesThe MultiMatch translation resources are based onthe WorldLingo machine translation system aug-mented with domain-specific dictionary resourcesgathered automatically from the WWW.This sectionbriefly reviews WorldLingo4,and then describesconstruction of our augmentation translation lexi-cons and their application for query translation inmultilingual metadata search.4.1Machine translation systemThere are a number of commercial machine transla-tion systems currently available.After evaluation ofseveral candidate systems,WorldLingo was selectedfor the MultiMatch project because it generally givesgood translation well between the English,Spanish,Italian,and Dutch,languages relevant to the Mul-timatch project5.In addition,it provides a usefulAPI that can be used to translate queries on theflyvia HTTP transfer protocol.The usefulness of sucha system is that it can be integrated into any appli-cation and present translations in real-time.It al-lows users to select the source/target languages andspecify the text format(e.g.plain textfile or htmlfile)of their inputfiles.The WorldLingo translationsystem also provides various domain-specific dictio-naries that can be integrated with translation system.A particularly useful feature of WorldLingo with re-spect to for MultiMatch,and potentially applicationswithin CH in general,is that to improve the qual-ity of translations,additional locally developed cus-tomized dictionaries can be uploaded.This enablesthe WorldLingo dictionaries to be extended to con-tain special terms for a specific domain.6/7/wikistats/wikipediasdesctification of the associated terms on the sametopic.Emphasized concepts In common with standard summarization studies,we observed that the first paragraph of a wikipedia document is usu-ally a concise introduction to the article.Thus, concepts emphasized in the introductory sec-tion are likely to be semantically related to the title of the page.In our study we seek to use these features from multilingual wikipedia pages to compile a domain-specific word and phrase translation lexicon.Our method in using this data is to augment the queries with topically related terms in the document lan-guage through a process of post-translation query expansion.This procedure was performed as fol-lows:1.An English vocabulary for the domain of thetest collection was constructed by performing a limited crawl of the English wikipedia8,Cate-gory:World War II.This category contains links to pages and subcategories concerning events, persons,places,and organizations pertaining to war crimes or crimes against humanity es-pecially during WWII.It should be noted that this process was neither an exhaustive crawl nor a focused crawl.The purpose of our cur-rent study is to explore the effect of translation expansion on metadata retrieval effectiveness.In total,we collected7431English web pages.2.For each English wikipedia page,we extractedits hyperlinks to German,Spanish,and French.The basename of each hyperlink is considered as a term(single word or multi-word phrase that should be translated as a unit).This pro-vided a total of4446German terms,3338 Spanish terms,and4062French terms.As an alternative way of collecting terms in German, Spanish,and French,we are able to crawl the wikipedia in a specific language.However,a page with no link pointing to its English coun-terpart will not provide enough translation in-formation.RUN IDAugmented lexicon using all termsappearing in the followingfieldsTable1:Run descriptions.3.For each of the German,Spanish,and Frenchterms obtained,we used the title term,the meta keywords,and the emphasized concepts ob-tained from the same English wikipedia page as its potential translations.For example,consider an English page titled as “World War II”9.The title term,the meta keywords, the emphasized concepts in English,and the hyper-links(to German,Spanish,and French)associated are shown in Figure1.Wefirst extract the base-names“Zweiter Weltkrieg”(in German),“Segunda Guerra Mundial”(in Spanish),and“Seconde Guerre mondiale”(in French)using the hyperlink feature. To translate these terms into English,we replace them using the English title term,all the English meta keywords and/or all the English emphasized concepts occurring in the same English wikipedia page.This is a straightforward approach to au-tomatic post-translation query expansion by using meta keywords and/or emphasized concepts as ex-panded terms.The effects of the features described above are investigated in this work,both separately and in combination,as shown in Table1,5Experimental SetupIn this section we outline the design of our exper-iments.We established a monolingual reference (RUN mono)against which we can measure multilin-gual retrieval effectiveness.To provide a baseline for our multilingual results,we used the standard WorldLingo to translate the queries(RUN mt).We then tested the MT integrated with different lexicons compiled using wikipedia.Results of these experi-ments,shown in Table1,enable us gauge the effect of each of our additional translation resources gen-erated using wikipedia.WarTitle: World War IIHyperlink to German:/wiki/Zweiter_WeltkriegHyperlink to Spanish:/wiki/Segunda_Guerra_MundialHyperlink to French:/wiki/Seconde_Guerre_mondialeMeta keywords:World War II,WWII history by nation,WWII history by nation,101st AirborneDivision,11th SS Volunteer Panzergrenadier Division Nordland,15th Army Group,1937,1939,1940Emphasized concepts:World War II(abbreviated WWII), or the Second World War, was a worldwide conflictwhich lasted from 1939 to 1945. World War II was the amalgamation of twoconflicts, one starting in Asia as the Second Sino-Japanese War, and the otherbeginning in Europe with the Invasion of Poland. The war was caused by theexpansionist and hegemonic ambitions of Germany, Italy, and Japan and economictensions between all major powers.Figure1:Title,hyperlinks,meta keywords,and emphasized concepts(underlined terms)extracted from the English wikipedia page /wiki/World II.The focus of this paper is not on optimising ab-solute retrieval performance,but rather to explore the usefulness of our translation resources.Thus we do not apply retrieval enhancement techniques such as relevance feedback which would make it more difficult to observe the impact of differences in behaviour of the translation resources.The ex-periments use the SUMMARYfield,as an exam-ple of concise natural language descriptions of CH objects;and the AKW1and AKW2fields as exam-ples of automatically assigned keyword labels with-out linguistic structure,with the MKWfield provid-ing similar manually assigned for keyword labels. Retrieval effectiveness is evaluated using standard TREC mean average precision(MAP)and the pre-cision at rank10(P@10).6Results and DiscussionThe results of our query translation experiments are shown in Table2,3,4,and5.For search using SUM-MARY and MKWfields,the lexicon compiled us-ing title terms provided an improvement of7∼9%, 7∼19%,and20∼30%,in German–English, Spanish–English,and French–English retrieval task, respectively.These improvements are statistically significant at the95%confidence level,and empha-size the importance of a good domain-specific trans-lation lexicon.The addition of meta keywords or emphasized concepts also improves results in most cases relative to the RUN mt results.However,we can see that re-trieval performance degrades when the query is ex-panded to contain terms from both meta keywords and emphasized concepts.This occurs despite the fact that the additional terms are often closely re-lated to the original query terms.While the addition of all these terms generally produces an increase in the number of retrieved documents,there is little or no increase in the number of relevant documents re-trieved,and the combination of the two sets of terms in the queries leads on average to a slight reduce in the rank of relevant documents.The results show that RUN mt+t runs provide the best results when averaged across a query set.How-ever,when analysed at the level of individual queries different combined translation resources are more effective for different queries,examples of this ef-fect are shown in Table6.This suggests that it may be possible to develop a more sophisticated transla-tion expansion methods to select the best terms from different lexicons.At the very least,it should be pos-sible to use“context-sensitivefiltering”and“com-bination of evidence”(Smets,1990)approaches to improve the overall translation quality.We plan to explore this method in further investigations.7Conclusion and Future WorkThis paper reports experiments with techniques de-veloped for domain-specific lexicon construction to facilitate multilingual metadata search for a CH re-MAP P@10MAP P@10MAP P@10 RUN mono MAP=0.1049P@10=0.1818RUN IDGerman–English French–English Spanish–EnglishRUN mt0.11580.17500.10000.16770.09030.1677RUN mt+t0.12350.21000.10710.20310.11710.2194RUN mt+m0.11710.13930.10230.20000.09830.1903RUN mt+c0.10840.15000.09580.16360.10890.1667RUN mt+m+c0.10690.16000.09470.17270.09400.1742Table3:Results for MKWfield search.(RUN mt+t run provides the best results in all cases.)MAP P@10MAP P@10MAP P@10 RUN mono MAP=0.0388P@10=0.1000RUN IDGerman–English French–English Spanish–EnglishRUN mt0.02790.03750.03470.06250.02050.0483RUN mt+t0.02790.04810.03510.06800.02380.0433RUN mt+m0.03020.04480.03610.05560.02230.0484RUN mt+c0.02750.04140.03320.05930.02680.0548RUN mt+m+c0.02990.04480.03510.05360.02730.0581Table5:Results for AKW2field search.(The best results are in bold.)trieval tasks.The results show that our techniques can provide a statistically significant improvement in the retrieval ing a tailored trans-lation lexicon enables us to achieve(77%,78%), (86%,67%)and(75%,63%)of the monolingual ef-fectiveness in German–English,Spanish–English, and French–English multilingual metadata SUM-MARY,MKWfield search tasks.In addition,the multilingual wikipedia proved to be a rich resource of translations for domain-specific terms. Intuitively,document translation is superior to query translation.Documents provide more context for resolving ambiguities(Oard,1998)and the trans-lation of source documents into all the languages supported by the retrieval system effectively reduces CLIR to a monolingual IR task.Furthermore,it has the added advantage that document content is acces-sible to users in their native languages.In our future work,we will compare the effectiveness of these two approaches to metadata search in a multilingual en-vironment.WorldLingoTitle terms Meta keyword Emphasized conceptsMeta keyword+ Emphasized conceptsSpanish–English16230.00630.00630.10140.00840.0334 30070.00000.00040.00280.00480.0057 Table6:Examples of MAP values obtained using different translation combinations for SUMMARYfield search.(The best results are in bold.)AcknowledgementWork partially supported by European Community under the Information Society Technologies(IST) programme of the6th FP for RTD-project Mul-tiMATCH contract IST-033104.The authors are solely responsible for the content of this paper.It does not represent the opinion of the European Com-munity,and the European Community is not respon-sible for any use that might be made of data appear-ing therein.ReferencesSisay Fissaha Adafre and Maarten de Rijke.2005.Discovering missing links in wikipedia.In Proceedings of the3rd inter-national workshop on Link discovery,pages90–97,Chicago, Illinois.ACM Press.Sisay Fissaha Adafre and Maarten de Rijke.2006.Finding similar sentences across multiple languages in wikipedia.In Proceedings of the11th Conference of the European Chapter of the Association for Computational Linguistics,pages62–69,Trento,Italy.Giuseppe Amato,Claudio Gennaro,Fausto Rabitti,and Pasquale os:A multimedia content man-agement system for digital library applications.In Proceed-ings of the8th European Conference on Research and Ad-vanced Technology for Digital Libraries,Lecture Notes in Computer Science,pages14–25.Springer-Verlag.Gosse Bouma,Ismail Fahmi,Jori Mur,Gertjan van Noord,Lon-neke van der Plas,and Jorg Tiedemann.2006.The univer-sity of groningen at QA@CLEF2006using syntactic knowl-edge for QA.In Working Notes for the Cross Language Evaluation Forum2006Workshop,Alicante,Spain.Thierry Declerck,Asunci`o n G`o mez P`e rez,Ovidiu Vela,Zeno Gantner,and David Manzano-Macho.2006.Multilingual lexical semantic resources for ontology translation.In Pro-ceedings of the5th International Conference on Language Resources and Evaluation,Genoa,Italy.Erik Hatcher and Otis Gospodnetic.2004.Lucene in Action(In Action series).Manning Publications Co.,Greenwich,CT, USA.Henning M¨u ller,Wolfgang M¨u ller,and David McG.Squire.2001.Automated benchmarking in content-based image re-trieval.In Proceedings of the2001IEEE International Con-ference on Multimedia and Expo,Tokyo,Japan.IEEE Com-puter Society.Douglas W.Oard,Jianqiang Wang,Gareth J.F.Jones,Ryen W.White,Pavel Pecina,Dagobert Soergel,Xiaoli Huang,and Izhak Shafran.2006.Overview of the CLEF-2006cross-language speech retrieval track.In Working Notes for the Cross Language Evaluation Forum2006Workshop,Ali-cante,Spain.Douglas W.Oard.1998.A comparative study of query and document translation for cross-language information re-trieval.In Proceedings of the3rd Conference of the Associ-ation for Machine Translation in the Americas on Machine Translation and the Information Soup,pages472–483,Lon-don,UK.Springer-Verlag.Philippe Smets.1990.The combination of evidence in the transferable belief model.IEEE Transaction on Pattern Analysis and Machine Intelligence,12(5):447–458.Ryen W.White,Douglas W.Oard,Gareth J.F.Jones,Dagobert Soergel,and Xiaoli Huang.2005.Overview of the CLEF-2005cross-language speech retrievaltrack.In Carol Pe-ters,Fredric C.Gey,Julio Gonzalo,Henning M¨u ller,Gareth J.F.Jones,Michael Kluck,Bernardo Magnini,and Maarten de Rijke,editors,CLEF,volume4022of Lecture Notes in Computer Science,pages744–759.Springer.。

相关文档
最新文档