The architecture of the festival speech synthesis system

合集下载

国际艺术节讲话稿

国际艺术节讲话稿

国际艺术节讲话稿英文回答:Ladies and gentlemen, honored guests, dear friends,。

It is with great pleasure and honor that I welcome you all to our inaugural International Arts Festival. This festival is a celebration of the diversity and beauty of human creativity, and we are thrilled to have artists from all over the world join us for this special occasion.The arts have the power to transcend borders, cultures, and languages. They can bring people together, inspire us, and help us to understand the world around us. Thisfestival is a testament to the power of the arts to unite us all.We have a wide range of events planned for this festival, including music, dance, theater, visual arts, and film. There will be something for everyone to enjoy, and weencourage you to explore all that the festival has to offer.I would like to take this opportunity to thank all ofthe artists who have traveled far and wide to be with us today. Your passion and dedication to your craft is an inspiration to us all. I would also like to thank our sponsors, whose generous support has made this festival possible.Finally, I would like to thank all of you for coming tonight. Your presence here shows your commitment to thearts and your belief in their power to make the world a better place.Please enjoy the festival!中文回答:尊敬的各位来宾、亲爱的朋友们:非常荣幸地欢迎大家参加我们的首届国际艺术节。

2024 元宵节晚宴发言稿简短英文

2024     元宵节晚宴发言稿简短英文

2024 元宵节晚宴发言稿简短英文Ladies and gentlemen, distinguished guests,It is with great pleasure that I stand before you tonight on this auspicious occasion of the Lantern Festival in 2024. This festival holds a special place in our hearts as we gather to celebrate the end of the Lunar New Year festivities.Tonight, as we come together to partake in this grand feast, let us remember the significance and traditions behind this beautiful festival. The Lantern Festival symbolizes unity and harmony as families and friends unite under the glow of the lanterns, marking the end of the festive season.As we indulge in the delicious delicacies laid before us, let us not forget the messages of the lanterns that light up the night sky. Each lantern represents our hopes, dreams, and aspirations for the year ahead. So, as we savor every bite, let us also take a moment to reflect on our own personal goals and how they align with the collective wishes of our community.The Lantern Festival is not just a celebration of peace and joy but also a time for us to appreciate the beauty of Chinese culture. The intricate lantern designs, the mesmerizing lion dances, and the melodic sounds of the traditional music transport us to a world of history and heritage. Tonight, we truly immerse ourselves in this rich tapestry as we feel the rhythm of the drums and witness the grace of the dancers.Moreover, this festival serves as a reminder of the importance offamily and loved ones. It is a time to show our appreciation for the support and love they provide, just as the lanterns shine brightly in the darkness, symbolizing the warmth and glow of familial ties.Let us embrace this festive spirit and cherish this moment together. May the Lantern Festival of 2024 bring us joy, prosperity, and fortitude as we embark on the journey ahead. May it inspire us to reach for the stars and guide us towards a future filled with success and happiness.Thank you and enjoy the evening.。

艺术节、英语节双语致词

艺术节、英语节双语致词

尊敬的各位老师,亲爱的同学们:早上好!今天,阳光明媚,我们迎来了北京师范大学南湖附属学校小学部的艺术节和英语节的开幕仪式。

我们将通过开展一系列英语节、艺术节主题活动给全体同学搭建一个英语和艺术学习的平台。

英语节和艺术节的活动将能使你更好地锻炼你的英语口语表达能力和艺术表现能力、协调能力和组织活动能力。

在接下去的时光里,每个人都能勇敢地去听,去说,去读,去表现。

这是我们老师希望看到的,并且每一位老师都将尽全力来帮助大家。

让我们在英语和艺术的天空里放飞希望, 在学习的海洋中健康成长!最后, 预祝我校英语节和艺术节取得圆满成功!谢谢大家!Good morning , dear leaders, teachers and our students,The sun is shining, we welcome the Arts Festival and the English festival of NanHu school affiliated to BeiJing Normal university.Through these festivals, all of you can have more chances to attend into the activities about English language and arts. These festivals are wonderful medias for you to practice your oral English and artistic performance ability .Also,your coordinating and organizing skills will benefit from them. In the coming month, everyone can be brave to listen, speak, read and act out. This is what we wish you to do and all the teachers will do our best to help you!Let’s fly our hopes in the sky of English and arts,and grow up healthily by studying.At last, may a complete success of the English festival and the arts festival of our school.Thank you for your listening!。

传统节日英文演讲2分钟

传统节日英文演讲2分钟

传统节日英文演讲2分钟Ladies and gentlemen,Today, I would like to talk to you about traditional festivals that are celebrated all around the world.Throughout history, humans have always celebrated important occasions, such as the change of seasons or religious beliefs, through festivals. Festivals not only offer us a chance to connect with our cultural heritage but also help us to understand the histories and traditions of different communities.One of the most celebrated festivals in the world is Christmas, which commemorates the birth of Jesus Christ. The festival is celebrated with various traditions, such as decorations, caroling, and gift-giving. Another popular festival is Diwali, which is the festival of lights celebrated by Hindus, Sikhs, and Jains. It represents the victory of light over darkness and is celebrated through the lighting of candles, fireworks, and sweets.Furthermore, the Chinese New Year is another major festival observed globally. It marks the beginning of a new lunar year and is celebrated with many customs like red envelopes, lion dances, and fireworks. In addition, there is also the Jewish festival of Hanukkah, which is celebrated for eight nights and days. It is observed with the lighting of a menorah, gift-giving, and eating traditional foods.Overall, traditional festivals play an important role in our lives as they offer us an opportunity to appreciate our cultural heritage andenable us to celebrate our identity. These festivals help to promote a sense of community, and allow us to connect with people from all backgrounds.Thank you for listening.。

节日贺词英语演讲稿

节日贺词英语演讲稿

节日贺词英语演讲稿(经典版)编制人:__________________审核人:__________________审批人:__________________编制单位:__________________编制时间:____年____月____日序言下载提示:该文档是本店铺精心编制而成的,希望大家下载后,能够帮助大家解决实际问题。

文档下载后可定制修改,请根据实际需要进行调整和使用,谢谢!并且,本店铺为大家提供各种类型的经典范文,如职场文书、合同协议、总结报告、演讲致辞、规章制度、自我鉴定、应急预案、教学资料、作文大全、其他范文等等,想了解不同范文格式和写法,敬请关注!Download tips: This document is carefully compiled by this editor. I hope that after you download it, it can help you solve practical problems. The document can be customized and modified after downloading, please adjust and use it according to actual needs, thank you!Moreover, our store provides various types of classic sample essays for everyone, such as workplace documents, contract agreements, summary reports, speeches, rules and regulations, self-assessment, emergency plans, teaching materials, essay summaries, other sample essays, etc. If you want to learn about different sample essay formats and writing methods, please stay tuned!节日贺词英语演讲稿中国的传统节日形式多样,内容丰富,是我们中华民族悠久的历史文化的一个组成部分。

中国传统节日英文演讲两分钟

中国传统节日英文演讲两分钟

中国传统节日英文演讲两分钟Good afternoon everyone,Today, I would like to talk to you about traditional Chinese festivals. As we all know, China has a rich cultural heritage and many traditional festivals that have been celebrated for centuries.One of the most important and well-known festivals is the Spring Festival, also known as Chinese New Year. It is celebrated on the first day of the lunar calendar and lasts for 15 days. During this time, families gather together, enjoy delicious food and exchange gifts. Red envelopes containing money are given to children and unmarried adultsto bring good luck and blessings for the coming year. The streets are filled with beautiful lanterns and traditional lion or dragon dances are performed to drive away evil spirits.Another significant festival is the Mid-Autumn Festival, also known as the Moon Festival. It is celebrated on the 15th day of the eighth lunar month when the moon is believed to be the fullest and brightest. Family members come together and enjoy mooncakes, a round pastry filled with lotus seed paste or other sweet fillings. It is a time for people to appreciate the beauty of the full moon and to cherish family ties.Dragon Boat Festival is yet another important festivalin China. It falls on the fifth day of the fifth lunar month and is a day to commemorate the legendary poet Qu Yuan. People participate in dragon boat races and eat sticky ricedumplings, wrapped in bamboo leaves, to ward off evil spirits and honor Qu Yuan's sacrifice.Apart from these, there are many other festivals with their own unique customs and traditions in China, such as the Lantern Festival, the Tomb-Sweeping Festival, and the Double Ninth Festival.These traditional festivals not only provide an opportunity for families to come together and strengthentheir bonds, but also serve as a way to pass down cultural values and beliefs from generation to generation. They are a reflection of our deep-rooted traditions and are an integral part of our cultural identity.In conclusion, traditional Chinese festivals such as the Spring Festival, the Mid-Autumn Festival, and the Dragon Boat Festival hold great significance in our culture. They notonly bring joy and happiness to our lives, but also help to preserve and promote our rich cultural heritage. Let us continue to celebrate and cherish these traditional festivals, and share their beauty and meaning with people from aroundthe world.Thank you.。

2024 元宵节英文发言稿

2024     元宵节英文发言稿

2024 元宵节英文发言稿Dear all,Good evening and welcome to our gathering on this special occasion of the Lantern Festival in 2024. Tonight, we are here to celebrate the traditional Chinese festival, also known as the Spring Lantern Festival, which signifies the end of the Lunar New Year celebrations.Firstly, let me extend my heartfelt appreciation to all of you for joining us tonight. It is an honor to have such a diverse group of individuals coming together to experience the beauty and richness of our cultural heritage. Tonight, we gather to not only enjoy the mesmerizing lights of the lanterns but also to immerse ourselves in the lively atmosphere, filled with laughter, joy, and unity.The Lantern Festival has a long history, originating from the Han Dynasty. Traditionally, this festival marks the return of spring and the beginning of agricultural activities. The lanterns, which illuminate the night sky, symbolize the hope for a prosperous and bountiful year ahead. It is a time for families to come together, to enjoy delicious food, to appreciate the colorful lantern displays, and to engage in various festive activities.Tonight, we have prepared an array of entertainment and activities for everyone to enjoy. We have traditional lion and dragon dances, which represent good luck and prosperity. We also have performances showcasing talents in Chinese martial arts, Chinese folk music, and dances, all of which are vibrant and captivating.On this joyful occasion, let us not forget the significance of unity and togetherness. The Lantern Festival teaches us the value of coming together as a community, regardless of our differences. It reminds us that light can overcome darkness, and that by working together, we can overcome any challenge that comes our way.As we celebrate the Lantern Festival, let us also remember to be mindful of the environment. In recent years, efforts have been made to promote eco-friendly celebrations. Therefore, I encourage everyone to use biodegradable and LED lights, minimizing our impact on the environment and ensuring that the beauty of this festival shines through for generations to come.In conclusion, I would like to express my gratitude once again to each and every one of you for being here tonight. Let us rejoice in the spirit of the Lantern Festival, enjoying the enchanting ambiance, savoring the mouthwatering delicacies, and appreciating the cultural heritage that unites us all.May the lanterns guide us on a path of enlightenment, prosperity, and harmony in the year ahead. Happy Lantern Festival!Thank you.。

春节文化致辞

春节文化致辞

春节文化致辞Chinese New Year, also known as Spring Festival, is one of the most significant traditional Chinese holidays. 春节,也被称为春节,是中国最重要的传统节日之一。

It is a time for families to come together, honor their ancestors, and celebrate the arrival of a new year. 这是一个让家人团聚在一起,祭拜祖先,庆祝新年来临的时刻。

The festival lasts for 15 days, starting from the first day of the lunar calendar and ending on the fifteenth day with the Lantern Festival. 这个节日持续15天,从农历的第一天开始,直到第十五天的元宵节结束。

One of the most important aspects of Spring Festival is the tradition of giving red envelopes filled with money to children and unmarried adults. 春节最重要的一个方面是给孩子和未婚成年人发红包的传统。

This act symbolizes good luck and prosperity for the recipients in the coming year. 这一举动象征着收到红包的人在接下来的一年里能够得到好运和繁荣。

The color red is believed to ward off evil spirits and bring good fortune, which is why all decorations, clothing, and envelopes associated with the festival are in this color. 红色被认为能够驱逐邪灵,带来好运,这就是为什么与春节有关的所有装饰、服装和信封都是这种颜色的原因。

  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

THE ARCHITECTURE OF THE FESTIV AL SPEECH SYNTHESISSYSTEMPaul Taylor Alan W Black Richard CaleyCentre for Speech Technology Research,University of Edinburgh,80,South Bridge,Edinburgh EH11HN,UKpault,awb,rjc@ABSTRACTWe describe a new formalism for storing linguistic data in a text to speech system.Linguistic entities such as words and phones are stored as feature structures in a general object called an linguistic item.Items are configurable at run time and via the feature struc-ture can contain arbitrary information.Linguistic relations are used to store the relationship between items of the same linguistic type.Relations can take any graph structure but are commonly trees or lists.Utterance structures contain all the items and rela-tions contained in a single utterance.Wefirst describe the design goals when building a synthesis architecture,and then describe some problems with previous architectures.We then discuss our new formalism in general along with the implementation details and consequences of our approach.1.INTRODUCTIONSpeech synthesis systems require ways of storing the various types of linguistic information produced in the process of con-verting the input format(e.g.text)into speech.In this paper we present a new formalism for representing arbitrary linguistic data and show how this helps in building a speech synthesis system. The are a number of design considerations which builders of syn-thesis architectures must take into account.We have listed these as follows:Simple linguistic objects such as words,phones,syllables and phrases need to be represented.A co-indexing mechanism is needed whereby given a word,one canfind the phones that comprise this word.Changes should be localized.For example,a change in the syllabification algorithm should only require changing those parts of the program directly involved with syllabification-other modules should not be affected.There should be no redundancy or duplication of informa-tion in the system.For instance,it is common to want to know the start and end times of linguistic entities.The end time of a word will be the same as the end time of the last phone in that word.If this information is stored separately for the phone and word,problems will occur if one value is changed as the other will then become out of date.By their very nature,multi-lingual systems must support a wide variety of linguistic theories.Hence the architecture should not be tied to any particular linguistic theory or for-malism.The architecture should be fast and efficient as the synthe-sizer is intended for real-time use.Most importantly,the real purpose of the architecture is to allow speech synthesis algorithms to be written as easily as possible.It is therefore important that the architecture should be unobtrusive and provide the sorts of structures and information that synthe-sis algorithms need.All programs must deal with infrastructure issues such as data storage,file i/o,memory allocation etc.If left unchecked,algorithms can easily become bogged down with this sort of code,which becomes intertwined with the actual al-gorithm itself.A good architecture will abstract the infrastructure to such an extent that these aspects are hidden,so that synthesis algorithms can be easily written and read without other issues get-ting in the way.The interface should be easy to use,and make the writing of synthesis algorithms easier and quicker than by purely ad-hoc methods.2.BACKGROUNDIn this section we review some previous types of architecture. 2.1.String processingMany early synthesis systems used what has been referred to as a string re-writing mechanism as their central data structure.In this formalism,the linguistic representation of an utterance is stored as a string.Initially,the string contains text,which is then re-written or embellished with extra symbols as processing takes place.Sys-tems such as MITalk[1]and the CSTR Alvey synthesizer[5]used this method.There are many short comings in this formalism which have been previously recognized([6],[3],[4],[7]).The main problem with the string formalism is that it soon becomes unwieldy for anything apart from the most trivial of tasks.Often the string becomes very complex with words,phrase symbols,stress symbols,phones etc all mixed in together.There are two main ways in which modules process such strings.Modules can work on this string directly,but the interpretation of the symbols often gets in the way of the al-gorithm itself.Alternatively,a module can parse the string into an internal format and let the algorithms use that.Although this may simplify the writing of the algorithms themselves,this is a very unwieldy approach as it means the string has to be parsed every time a module is called.Moreover,this often leads to each module having an individual internal data structure,which is unattractive from a programming point of view as new structures and tech-niques have to be learnt to understand the workings of any new module.To lessen these problems,information is often deleted from the string so as to keep only what is perceived as essential informa-tion.For instance,after the grapheme to phoneme conversion, the orthographic form of the word may be deleted from the string. This can have unfortunate consequences in that information which might potentially be of use to a module may have been deleted by an earlier module.2.2.Multi-level data structuresIn recognition of these shortcomings,many current systems have abandoned string based processing and now use multi-level data structures(MLDS).The most famous of these systems,known as Delta,was developed by Hertz[6],but many other systems,such as Chatr[3],the Bell labs system[7],Polyglot[4],and early ver-sions of Festival[2]use similar formalisms.In the multi-layered formalism,different types of linguistic information are held in separate streams which are linear lists or arrays of linguistic items. For example we may have a word stream,a phone stream and a syllable stream.Some systems have afixed set of these while others an allow arbitrary number.Often algorithms need to know what phones are related to a given word,and hence the streams must be co-indexed.There are two main types of co-indexing.In Delta,streams are aligning by the edges of items.Tofind the phones in a word,one goes to the be-ginning of the word,traces the edge“down”to the phone stream, and they progresses along the phone stream until the edge relating to the end of the word is found.An alternative strategy is to align by the“centres”of items.In this case,a word contains a set of links to the phones that are related to it,and the phones in a word can be found by following these links.While multi-level data structures are far preferable to string struc-tures they still have serious drawbacks.The main drawback stems from the fact that this forces all information to be represented by linear structures:other types of structure,specifically trees, are very hard to represent.Partly because of this,the number of streams in a system can become considerable which leads to difficulties in co-indexing the items in streams.With the delta co-indexing,it is often the case that for a given item in one stream, there is no corresponding item in other streams.For example,a pause is represented by a item in the phone stream,but there is no equivalent item in the syllable or word streams.Thus a“hole”must be created in these streams to ensure the co-indexing works.With a large number of streams,the number of holes can become considerable which makes processing awkward.In the centre-linking paradigm,the hole problem is absent,but because each item must be explicitly linked to items in other streams,the num-ber of links can become very large.Furthermore,if a new stream is added,one would have to link every existing item to the items in the new stream to ensure full connectivity.As this isn’t possi-ble in practice,the streams are often left partially connected.This can cause confusion in writing module,as one may be unsure of what streams are linked to what.Another more subtle problem occurs in multi-level structures due to a lack of clarity as to what an item really represents.Often(not always)each item has a single value,typically a name.While it is obvious that a suitable phone name could be something like /h/or/e/and a word“hello”,what should the name of a syllable be?Commonly,syllables are regarded as organisational units, which serve to group together phones.As such,they don’t have a distinct name.One could group together the names of the phones and make that the syllable name(e.g.“/h e l/”for thefirst syllable of“hello”),but this is redundant,somewhat artificial and likely to cause errors if the phone representation is changed.3.A FORMALISM BASED ONINTERSECTING RELATIONSThe most significant difference between the Festival architecture and those using MLDS is that Festival does not constrain linguis-tic items to be in linear lists:any graph structure is allowed.Addi-tionally,items can be contained in more than one structure,which leads to more efficient representations.We have also generalised the content of an item such that information of arbitrary complex-ity can be easily stored.3.1.RelationsWe used the term relation to represent our generalisation of the stream concept.A relation is a data structure which is used to organize linguistic items into linguistic structures such as trees and lists.A relation is a set of named links connecting a set of nodes.Lists(streams in the MLDS)are linear relations where each node has a previous and next link.Nodes in trees have up and down links also.Nodes are purely positional units,and contain no information apart from their links.In addition to having links to other nodes,each node has a single link to an item,which contains the linguistic information.Figure1shows an example utterance with a syntax relation and a word relation.The word relation contains nodes which have next and previous connections,whereas the syntax relation,has up and down connections in addition to next and previous.Each node in the syntax tree is linked to a item,each of which has a feature,CAT,giving its syntactic category.Terminal nodes in a syntax tree are words,and so an additional feature name is used here.The nodes in the word relation,which is a linear list,arepreviousFigure1:An example representation of an utterance structure. This example shows the word relation and the syntax relation. The syntax relation(shown on top)is a tree with links connecting the nodes,shown as black circles.The word relation(shown on the bottom)is a list.The items contain the actual linguistic in-formation and are shown in the rounded boxes.The dotted lines show the connections between the nodes and items.also linked to the items that are linked to the terminal nodes in the syntax tree.In this way,node structures of arbitrary complexity can be con-structed,and they can be intertwined in a natural way by having links from different nodes to the same item.3.2.Items and FeaturesItems consist of a bundle of features and a set of named links to nodes in relations.Items can be linked into any number of rela-tions-in the above example words were linked into2relations, but in principle any number of relations is possible.The information in items themselves are represented by features which are stored as a list of key-value pairs.Feature values are commonly numbers or strings,but may also take complex objects as values if necessary.A item can have arbitrarily many features, including zero:syllable items often have no features at all,for example.As well as simple values,features can also take functions as val-ues.Function features are a powerful facility which can help to greatly reduce the amount of redundant information in a utterance structure.Consider the case of a simple phone segmentation of an utterance, which comprises a contiguous list of named segments each with timing information.If the phones are properly contiguous,only one timing value for each phone is needed to fully represent all the timing information of the segmentation.If the end point is used,the start time of an item will be equal to the end point of the previous item and the duration will be equal to the start sub-tracted from the end.However,it is often useful to have access to the start times and duration of the phone also.Without the use of function features,there are two choices.Either the end infor-mation alone can be stored and calculations can be done on the fly to compute start times and durations,or else the start and du-rations can be written in as additional features to the item.The first solution is unattractive as this can make algorithm writing unwieldy and overly complicated.While the second solution will may make algorithm writing easier,it involves effectively copy-ing information,which can lead to out of date information being present.Function features provide a neat solution to this problem.In the current example,a start function is written(in Festival this can be written in C++or scheme)which looks at the previous item and returns its end.This function is then assigned as the value of a key named“start”in the items in question.When one accesses the start feature a time value is returned as if it were actually stored.A duration function returns the value of the start feature subtracted from the end feature.The duration function simply evaluates the feature named“start”-it doesn’t need to know if it is a simple feature or a function feature.In our current usage of the Festival architecture,we only keep time positions in items linked to the segment relation:all other times are calculated by functions.Different start functions can be written for different purposes.For example,the end function assigned to the non-terminal nodes in the syntax tree descends from the current node until a terminal node is encountered and that node’s item’s end value is returned.Of course the terminal node’s item,being a word,has its end feature as a function also-a common operation is for the word’s end function to return the time of the last syllable it is related to,which returns the time of the last phone etc.In each case a separate end function is used, but all are assigned to the key“end”in the item.This means that algorithms can evaluate the end feature on any item,and be sure of a legitimate value being returned without having to worry about the details of calculation.Although the function feature implementation is very efficient in Festival,it can still sometimes be expense to constantly evaluate these functions in the middle of a time critical loop.A global evaluation facility is therefore provided which can evaluate all the feature functions in items in a relation and re-write the features with their evaluation.After the loop,these can be discarded and the feature functions used again,thus ensuring little chance of out of date information being present.Function features can be used for a variety of purposes other than timing.Another common usage in Festival is to use functions forintonation.Typically,intonation accents are associated with syl-lables.Thus one can ask whether a syllable is accented or not. However it is also useful to know if words are accented also.In this case we define a function feature on word items which looks at that word’s main stressed syllable and returns true if that sylla-ble is accented.3.3.UtterancesUtterance structures are collections of relations.The best way to visualize an utterance is to think of it containing a unordered set of items,each of which is made up from a set of features.Relations are then structures comprising of nodes,each of which indexes into an item.An utterance structure simply collects these together to form a single object.4.IMPLEMENTATIONWhile we will not go into the actual low level details of our im-plementation here,it is useful to raise some points about how the nature of the programming language used affects the implemen-tation.Festival is implemented in two languages,C++and scheme(a variant of lisp).While in principle it would be attractive to imple-ment the system in a single language,practical reasons concerning the nature of programming languages necessitate the approach we have taken here.In addition to being a research platform,Festival also operates as a run-time system and hence speed is vital.Because of this,it is necessary to have substantial amounts of the code written in a compiled low-level language such as C or C++.For particular types of operations,such as the array processing often used in signal processing,a language such as C/C++is much faster than higher-level alternatives.However,it is too restrictive to use a system that is100%compiled,as this prevents essential run-time configuration as the following examples demonstrate.In developing an algorithm,it is often useful to try alternatives. With a completely compiled system,trying alternatives would in-volve changing the code and then re-compiling and running the program.While this may be an acceptable effort for two or three algorithm variations,it soon becomes impractical for larger num-bers,especially when a large set of alternatives need to be tried as part of an experiment.With an interpreter,the changes can be made at run time,and it is often a simple matter to write a scheme script which can iterate through all the alternatives and produce a table of results.Festival is used for synthesizing many languages and it would be impossible to re-configure the compiled code for each.In practi-cal usage,Festival must therefore beflexible enough so that any algorithm in any language can be implemented without the re-compilation of existing code.Due to this requirement,structures such as features,items and relations have been implemented so that any number of each with any name can be used.Specifically the use of C structures,where thefields in the structure would correspond to entities such as“stress”,“end”or“part of speech”have been avoided as the addition of any new feature would re-quire a re-compilation.Instead,features are represented by ex-tensible key-value lists.Feature names are stored as strings and efficient functions are used to return the value of a feature given the string name.The relations in an utterance are also stored as an extensible list,with strings being used to provide access to a given relation.Hence there is nothing in the C++architecture which dictates what relations,items or features should be called. All items,relations and features are of exactly the same type in C++regardless of what linguistic information they carry.It is only at run time that their linguistic function is designated.We have found from experience that when designing a complex architecture such as this,it is important to take into account the expectations of a programmer with regard to a language their are experienced with.In a previous architecture we developed([3]) we achieved the some of the generality and run-timeflexibility described above.However the C language constructs we used we often obscure,stretching the language to its very limits.Because of this,even experience C programmers found the system very difficult to program with simply because much of the code didn’t look like recognizable C to them.In Festival we have paid much more attention to this problem as the current C++interface seems fairly natural and unobtrusive.File I/O functions have been written so that utterance structures be saved and loaded to and from disk at any time in the synthesis procedure.In fact,this facility has proved so useful that we now store many of our speech databases(which contained phone,word and intonation information)in this format.4.1.Core System and ModulesIn Festival we make afirm distinction between the core system and the modules which actually perform speech synthesis tasks. The core system,which includes the architecture is written com-pletely in C++and doesn’t change.Modules on the other hand can be written in C++or scheme and can be added or taken away form the system with minimal disruption(adding or taking away a C++module requires a re-linking of course,but not a major re-compilation).The decision of which language to write a module in is largely a matter of choice.Because of the interpreter aspect, it is usually easiest to develop modules in scheme and maybe for efficiency re-write them in C++after they are stable.Some types of programming(e.g.arrays in C++,recursion in scheme)are more natural in one language than another.Finally,depending on personal experience,some programmers simply prefer one to the other.As far as possible we have provided identical interfaces to the architecture in both languages which makes switching between each relatively easy.4.2.Other LanguagesNothing in the design of the system architecture is specific to the languages we have used in our implementation.C++was used to facilitate better data abstraction in the classes/structures of the architecture,but C could have also been used.Scheme was used firstly because it is a small and clean scripting language and sec-ondly because the lisp s-expression(bracketed string)data struc-ture allows a generic way to store complex data structures.Other languages would be also be suitable,and we have made some progress towards providing alternative scripting languages in the style of perl/unix shell and Java etc.Ideally the compiled part and interpreted part would be in the same language.Until recently no available language has this degree offlexibility,but perhaps a future generation of Festival could accomplish this by using Java.5.CONCLUSIONIn summary,we feel that the following features of the current Fes-tival architecture go a substantial way to reaching the design goals mentioned in the introduction.Complex relations:allows trees,lists and other linguistic structures to be represented in the same formalism.Inter-secting relations allow an item to be in more than one rela-tion which saves on redundancy.Feature structures in items:allows arbitrary amounts and types of information to be used to describe linguistic objects.Due to run-time configurability,no re-compilation is needed for new types of information to be stored.Function features:allows useful information to be calcu-lated on thefly,reducing redundancy.Speed:efficient implementation ensures that the expres-sive power of the architecture does not impose a prohibitive speed cost.Natural interface:quick to learn and hence programmers are not intimidated when learning to write modules for Fes-tival.That said,we acknowledge that architecture design is never a solved problem.As the standard of the architecture increases, so do the demands and expectations of the programmers using it, and hence new design features are always required.However,we feel that the features listed above are a substantial improvement on previous architectures and go along way to facilitating quick and easy speech synthesis programming and algorithm implemen-tation.AcknowledgementsWe gratefully acknowledge the support of the UK Engineering and Physical Science Research Council(grants GR/L53250and GR/K54229)and Sun Microsystems.6.REFERENCES1.J.Allen,S.Hunnicut,and D.Klatt.From Text to Speech:the MITalkSystem.Cambridge University Press,1987.2. A.W.Black and P.Taylor.The Festival Speech Synthesis System:system documentation.Technical Report HCRC/TR-83,Human Com-munciation Research Centre,University of Edinburgh,Scotland,UK, 1997.Avaliable at /projects/festival.html. 3.Alan W.Black and Paul A.Taylor.CHATR:A generic speech synthe-sis system.In COLING’94,Kyoto,Japan,1994.4.Louis Boves.Considerations in the design of a multi-lingual text-to-speech system.Journal of Phonetics,19(1):309–327,1991.5.W.N.Campbell,S.D.Isard,A.I.C.Monaghan,and J.Verhoven.Du-ration,pitch and diphones in the CSTR TTS system.In International Conference on Speech and Language Processing’90,Kobe,Japan, 1990.6.Susan R.Hertz.The delta programming language:an integrated ap-proach to non-linear phonology,phonetics and speech synthesis.In John Kingston and Mary E.Beckman,editors,Papers in Laboratory Phonology1.Cambridge University Press,1990.7.Richard Sproat and Joseph Olive.A modular architecture for multi-lingual text-to-speech.In Second ESCA/IEEE Workshop on Speech Synthesis,New York,1994.。

相关文档
最新文档