A Bottom-up Merging Algorithm for Chinese

合集下载

Strapdown Inertial Navigation Integration Algorithm Design Part Velocity and Position Algorithms

Received July 7, 1997; revision received Oct. 9, 1997; accepted for publication Oct. 9, 1997. Copyright °c 1997 by Strapdown Associates, Inc. Published by the American Institute of Aeronautics and Astronautics, Inc.,
I. Introduction
A STRAPDOWN inertial navigation system (INS) is typically composed of an orthogonal three-axis set of inertial angular rate sensors and accelerometers providing data to the INS computer. The inertial sensors are directly mounted (strapdown) to the INS chassis structure in contrast with original INS technology that utilized an active multiaxis gimbal isolation mounting assembly
= direction cosine matrix that transforms a vector from its A2 frame projection form to its A1 frame projection form
= identity matrix = column matrix with elements equal to the projection

UTS-702电话手柄使用说明书

ANSI/IEEE C95.1-1992, IEEE Standard for Safety Levels with Respect to Human Exposure to Radio Frequency Electromagnetic Fields, 3 kHz to 300 GHz.
ANSI/IEEE C95.3-1992, IEEE Recommended
Key Lock .............................................................................................. 33 Security ID .......................................................................................... 34 Dial Lock .............................................................................................. 35 To activate dial lock ........................................................................ 35 To deactivate dial lock .................................................................... 36 Phonebook Lock ................................................................................. 37 Other Function Settings ...................................................................... 38 Zone-in tone ..................................................................................... 38 Echo suppression ............................................................................. 39 Charge check tone .......................................................................... 40 Key tone ............................................................................................ 41 Calling line Identification ................................................................ 42 Auto call ........................................................................................... 43 Any key answer ............................................................................... 44 Power saving ................................................................................... 45 TAM (Telephone Answering Machine) ........................................... 46 Recording your outgoing message ............................................... 46 Activating TAM ............................................................................... 47 Erasing the outgoing message ...................................................... 47 Setting the delay time .................................................................... 48 Setting the ring tone type .............................................................. 49 Playing back and erasing the (ICM) incoming message ........... 50 Voice Memo ..................................................................................... 51 Recording a Voice Memo ............................................................. 51 Playing back or erasing a Voice Memo ...................................... 52 ISDN Sub-address Selling ............................................................... 53 Activating sub-address setting ....................................................... 53 Storing the phone number of ISDN station .................................. 54 Making a call by specifying sub-address ..................................... 55 Setting Check .................................................................................... 56 Home Antenna Mode ...................................................................... 57 Registering with home antenna ..................................................... 57 Clearing registration ....................................................................... 58 Selecting Home Antenna mode ..................................................... 59

皮肤镜辅助诊断女童难辨认毳毛癣1例

㊃病例报告㊃皮肤镜辅助诊断女童难辨认毳毛癣1例李美荣刘文韬吴榕尹颂超冯佩英(中山大学附属第三医院,广州510630)ʌ摘要ɔ患儿,女,6岁㊂临床表现以右侧面颊淡红斑为主,未见明显鳞屑或脓疱,在皮肤镜下发现病变毳毛变短,根部有少许鳞屑,拔取病变毳毛进行荧光染色后在荧光显微镜下病变毳毛毛根部被发出亮蓝色荧光的有隔无色透明菌丝包绕,菌丝沿着毛干方向生长,部分菌丝游离向周围伸展,镜下未见皮屑中有真菌菌丝或者孢子㊂形态学及分子生物学鉴定该病原菌为太田节皮菌(犬小孢子菌为无性期)㊂诊断:儿童难辨认毳毛癣㊂予口服伊曲康唑胶囊100m g,每日一次,局部外用特比萘芬乳膏㊂4周后随访,患儿面部背景颜色恢复正常,真菌镜检(-),予停药观察㊂随访3个月无复发㊂ʌ关键词ɔ面部毳毛癣;皮肤镜;难辨认癣ʌ中图分类号ɔ R756ʌ文献标志码ɔ A ʌ文章编号ɔ1673-3827(2023)18-0344-03D i a g n o s i s o f a g i r l w i t h i n d i s t i n g u i s h a b l e t i n e a o f v e l l u m h a i r b y d e r m a t o s c o p yL I M e i r o n g,L I U W e n t a o,WU R o n g,Y I N S o n g c h a o,F E N G P e i y i n g(T h e D e p a r t m e n t o f D e r m a t o l o g y,T h i r d A f f i l i a t e d H o s p i t a l,S u n Y a t-s e n U n i v e r s i t y,G u a n g z h o u510630,C h i n a)ʌA b s t r a c tɔ A6-y e a r-o l d g i r l p r e s e n t e d w i t h a l i g h t e r y t h e m a o n h e r r i g h t c h e e k w i t h n o s c a l e s o r p u s t u l e s.U n d e r d e r m o s-c o p y,t h e d i s e a s e d h a i r w a s f o u n d t o b e s h o r t e n e d w i t h a f e w s c a l e s o n t h e r o o t.B y f l u o r e s c e n t s t a i n i n g,t h e d i s e a s e d h a i r w a s s u r r o u n d e d b y t r a n s p a r e n t h y p h a e w h i c h e m i t t e d b r i g h t b l u e f l u o r e s c e n c e,a n d t h e h y p h a e g r e w a l o n g t h e d i r e c t i o n o f t h e h a i r s h a f t,s o m e o f w h i c h w e r e e x t e n d e d t o t h e s u r r o u n d i n g s.N o h y p h a e o r s p o r e s w e r e f o u n d i n t h e d a n d r u f f u n d e r t h e d e r-m o s c o p y.T h e p a t h o g e n w a s i d e n t i f i e d b y m o r p h o l o g y a n d m o l e c u l a r b i o l o g y a s A t h r o d e r m a o t a e(t e l e o m o r p h o f M i c r o s p o-r u m c a n i s).T h e d i a g n o s i s o f t i n e a o f v e l l u s h a i r o n a c h i l d s f a c e w a s m a d e.O r a l i t r a c o n a z o l e c a p s u l e100m g/d a n d t o p i c a l t e r b i n a f i n e c r e a m w e r e a d m i n i s t r a t e d.A f t e r4w e e k s f o l l o w-u p,t h e c h i l d s f a c i a l b a c k g r o u n d c o l o r r e t u r n e d t o n o r m a l,a n d m i c r o s c o p i c e x a m i n a t i o n s h o w e d n e g a t i v e r e s u l t,s o t h e m e d i c a t i o n s w e r e s t o p p e d.T h e r e w a s n o r e c u r r e n c e i n3m o n t h s f o l l o w-u p.ʌK e y w o r d sɔt i n e a o f v e l l u s h a i r;d e r m o s c o p y;i n d i s t i n g u i s h a b l e t i n e a[C h i n J M y c o l,2023,18(4):344-346]毳毛癣是一种相对少见的特殊类型皮肤癣菌病,临床容易误诊误治,延误病情,引发不良医疗纠纷[1]㊂毳毛癣的临床特征多样化,微小毳毛细软无髓质,透明或半透明,仅凭肉眼判断毳毛病变很困难㊂有研究显示,借助无创的具有放大偏振功能的皮肤镜能根据特征性的镜下形态快速作出诊断,并且有效地跟踪治疗随访㊂现报道通过皮肤镜辅助诊断治疗女童面部毳毛癣1例㊂作者简介:李美荣,女(汉族),硕士,主管技师.E-m a i l:l i m r o n g@ m a i l.s y s u.e d u.c n通信作者:冯佩英,f e n g p y@m a i l.s y s u.e d u.c n 1临床资料患儿,女,6岁8个月,体重18k g㊂右面部皮肤出现钱币大小红斑2周,有宠物猫接触史㊂专科查体:右面部可见2c mˑ3c m大小红斑,中央颜色略浅,皮损边界不清,未见明显鳞屑㊁脓疱,肉眼下可见少许毳毛底端黏附少量鳞屑(见图1)㊂偏振光皮肤镜(北京德迈特)下皮损背景颜色偏粉红,可见较多毛细血管扩张,未见鳞屑或脓疱;部分毳毛明显比正常毳毛短,毳毛根部包裹鳞屑,色偏黄(见图2)㊂在皮肤镜辅助下,用无菌镊子夹取可疑病变毳毛,滴加荧光染料(钙荧光白,江苏莱芙特敏),立即置于荧光显微镜下,调整光源模式为紫外光以及普通光源与紫外光源重叠,可见病变毳毛毛根部被发出亮蓝色荧光的有隔无色透明菌丝包绕,菌丝沿着毛干方向生长,部分菌丝游离向周围伸展(见图3),在荧光显微镜下未见皮屑中有真菌菌丝或者孢子㊂取皮损处皮屑和毳毛接种于含氯霉素的沙堡弱培养基(S D A )上,28ħ恒温培养箱内培养,10d 后可见明黄色绒毛状菌落(见图4)㊂挑取长出的菌落进行小培养后,揭片经荧光染色后可见厚壁且分隔的纺锤状大分生孢子(见图5),顶点弯曲,有棘状突起,形态学诊断为犬小孢子菌(太田节皮菌的无性期)㊂采用OM E G A 试剂盒(F u n ga l D N A k i t ),对样本的总D N A 进行提取,经I T S 1/4引物做P C R 扩增,送广州睿博生物科技有限公司测序,产物测序后登陆G e n B a n k 比对,在N C B I (h t t p ://w w w.n c b i .n l m.n i h .go v /)网站上进行B L A S T 比对,该结果与太田节皮菌(犬小孢子菌为无性期)同源性达99%㊂诊断及治疗诊断为犬小孢子菌(太田节皮菌的无性期)所致面部毳毛癣,予口服伊曲康唑胶囊100m g,每日1次,局部外用特比萘芬乳膏㊂嘱患儿远离宠物猫,注意手卫生㊂2周后随访,面部红斑消退,皮肤镜下检查仍可见短小毳毛,显微镜下见毳毛内有真菌菌丝,继续药物维持治疗㊂4周后随访,患儿面颊颜色恢复正常,未见明显鳞屑,毳毛无鳞屑(见图2B ),皮肤镜下可见毳毛恢复正常状态,真菌镜检(-),予停药观察㊂随访3个月无复发㊂2 讨论毳毛又称汗毛,细软无髓质,透明或半透明,主要见于面部㊁四肢和躯干部[2]㊂毳毛癣由E b y等[3]于1971年首次报道㊂该病临床表现多样化,皮疹有糜烂㊁渗出等,毛囊性的微小脓疱,脱屑,结痂,部分皮疹边界不清,呈水肿性红斑㊁色素减退斑等㊂若不及时治疗,易导致毛囊性脓疱㊁感染性肉芽肿等症状[4]㊂由于大多数患者有外用皮质类固醇或外用抗真菌药物史以及对治疗耐药等[5],这使得毳毛癣更难诊断㊂临床上容易误诊为脂溢性皮图1 患儿右面部可见钱币大小红斑图2 A.皮损可见较多毛细血管扩张,未见鳞屑或脓疱;部分毳毛明显比正常毳毛短(黄色箭头),毳毛底端包裹黄色鳞屑(ˑ20,黑色箭头);B .治疗后4周皮损未见毳毛鳞屑(ˑ20) 图3 可见病变毳毛毛根部被发出亮蓝色荧光的菌丝包绕,部分菌丝游离向周围伸展(ˑ400,黄色箭头).A.紫外光;B .普通光源与紫外光源重叠图4 沙堡弱培养基(S D A )上,培养10d,可见呈放射状绒毛菌落,背面可见产明黄色色素菌落.A.培养皿正面;B .培养皿背面F i g .1 C o i n -s i z e d e r y t h e m a o n t h e r i g h t f a c e o f t h e c h i l d F i g.2 A.T e l a n g i e c t a s i a w a s s e e n ,w i t h o u t s c a l e s o r p u s t u l e s i n t h e l e s i o n s ,s h o r t e r v e l l u m h a i r s t h a n n o r m a l (ye l l o w a r r o w ),w h i c h c o n t a i n e d y e l l o w s c a l e s a t t h e b o t t o m (ˑ20,b l a c k a r r o w );B .Af t e r 4w e e k s o f t r e a t m e n t ,v e l l u m s c a l e d i s a p p e a r e d i n t h e l e s i o n s (ˑ20) F ig .3 B r i gh t b l u e f l u o r e s c e n t h y p h a e s u r r o u n di n g t h e r o o t s o f t h e d i s e a s e d v e l l u s h a i r s ,a n d s o m e o f t h e h y p h a e s p r e a d i n g o u t (ˑ400y e l l o w a r r o w ).A.U n d e r t h e m e r g i n g m o d e o f u l t r a v i o l e t l i g h t ;B .U n d e r t h e m e r g i n gm o d e o f u l t r a v i o l e t l i g h t a n d n a t u r a l l i g h t F i g.4 O n S D A m e d i u m ,a f t e r 10d a y s o f c u l t u r e ,t h e c o l o n i e s w e r e f o u n d t o b e r a d i a t i n g c h o r i -o n i c v i l l i a n d p r o d u c e b r i g h t y e l l o w p i gm e n t o n t h e b a c k .A.F r o n t o f t h e c u l t u r e m e d i u m ;B .B a c k o f t h e c u l t u r e m e d i u m图5小培养后,揭片经棉兰染色后可见厚壁和分隔的纺锤状大分生孢子(ˑ400,黄色箭头)F i g.5 A f t e r s m a l l c u l t u r e,t h i c k-w a l l e d a n d s e p a r a t e d s p i n d l e-l i k e m a c r o c o n i d i a c o u l d b e s e e n i n t h e c u t-o f f s e c t i o n s s t a i n e d w i t h m a g-n o l i a(ˑ400,y e l l o w a r r o w)炎㊁脓皮病㊁白色糠疹㊁白癜风㊁毛囊炎㊁特应性皮炎㊁玫瑰痤疮㊁盘状红斑狼疮㊁银屑病等[4,6]㊂难辨认癣的毳毛在皮肤镜下特征改变为:有莫尔斯码毛㊁易变形毛㊁逗号毛和螺旋毛以及毛囊周围的鳞屑[7]㊂毳毛癣的诊断一般由真菌学检查查到毳毛内和/或毳毛周围围绕有真菌菌丝或者孢子而确诊,但常合并普通体癣,多数在皮屑中能查到真菌菌丝和/或孢子㊂本例中,病变微小,刮取皮屑中未见真菌菌丝,通过拔取毳毛才发现真菌菌丝,这又给临床诊断增加了难度㊂据推测:毳毛癣可能是真菌通过与角质形成细胞之间的黏附作用侵犯头皮或毛发,侵入的孢子通过变形肿胀㊁向毛根部生长以及分泌蛋白水解酶等方式破坏毛发的结构和功能,人体通过头皮皮脂腺中的不饱和脂肪酸㊁自身免疫及产生抗菌肽等方式来抵御真菌入侵,儿童常因皮脂腺发育不成熟㊁自身抵抗力差,较易发生头癣和毳毛癣[8]㊂引起毳毛癣的真菌可聚集在毛发表面,尤其是毛根部,充当真菌的储存库㊂研究表明:感染真菌的毳毛在皮肤镜下常具有以下特征:毛发根部伴有鳞屑斑块㊁毳毛变短小㊁发生弯折,出现破碎的毛发,螺旋状的毛发,营养不良的毛发和莫尔斯电码样毛发和毛囊脓疱[7,9]㊂而本例中毳毛变短且伴有黄色鳞屑,亦是毳毛癣的毛发改变特征之一㊂因此,在临床医师与检验技师接诊体癣尤其面癣患者时,需仔细观察皮疹,如患处有无短小的毛发脱落㊁是否原红斑皮疹中央有散在的无规律的新发高出皮面红色皮疹,通过询问病史了解患者有无外用糖皮质激素等药膏,或者外用抗真菌药膏后未能治愈且反复发作等情况,这时可以借助皮肤镜观察毳毛并取材,当怀疑是毳毛癣感染时,一定要用薄且长柄无齿镊子精准夹取短小病变毳毛进行真菌学检查㊂值得注意的是,在随访过程中,当实验室真菌镜检阴性后两周,可予停药,此时再次使用皮肤镜辅助检查,亦可更精准判断疗效㊂毳毛癣的诊断是治疗的关键,而治疗也相对特殊,需要使用系统性抗真菌药物,类似头癣的治疗方案㊂这是因为毳毛癣和头癣均是毛发感染,药物需穿透毛囊才能起效,且在发病年龄㊁动物接触史和致病菌种上存在一致性,因此,学者提倡:毳毛癣需要口服加外用抗真菌药物联合治疗,且疗程需要4~6周[10],较单纯体癣的疗程长,这样治疗才能取得较好的疗效㊂综上,应用皮肤镜可以帮助观察毳毛形态及辅助定位拔取病变毛发做真菌镜检,以辅助诊断毳毛癣㊂在随访过程中,再次使用皮肤镜还能帮助判断该病的治疗拐点,以确定疗程㊂因此,皮肤镜在诊治难辨认癣中的价值较大,值得临床推广㊂参考文献[1]GÓM E Z-M O Y A N O E,C R E S P O E R C H I G A V,M A R TÍN E ZP I L A R L,e t a l.U s i n g d e r m o s c o p y t o d e t e c t t i n e a o f v e l l u s h a i r[J].B r J D e r m a t o l,2016,174(3):636-638.[2]王磊,范卫新.毛囊生物学基础知识[J].中国医学文摘:皮肤科学,2016,33(4):409-414.[3]E B Y C S,J E T T O N R L.N a n n i z z i a i n c u r v a t a i n f e c t i o n o fv e l l u s h a i r[J].B r J D e r m a t o l,1971,85(6):582-584. [4]闫玮.儿童毳毛内癣16例[J].中国真菌学杂志,2016,11(2):4.[5]K NÖP F E L N,D E L P O Z O L J,E S C U D E R O M D E L M,e ta l.D e r m o s c o p i c v i s u a l i z a t i o n o f v e l l u s h a i r i n v o l v e m e n t i nt i n e a c o r p o r i s:A c r i t e r i o n f o r s y s t e m i c a n t i f u n g a l t h e r a p y[J]?P e d i a t r D e r m a t o l,2015,32(5):e226-7.[6]GÓM E Z-MO Y A N O E,C R E S P O-E R C H I G A V.T i n e a o fv e l l u s h a i r:a n i n d i c a t i o n f o r s y s t e m i c a n t i f u n g a l t h e r a p y[J].B r J D e r m a t o l,2010,163(3):603-606.[7]S O N T HA L I A S,A N K A D B S,G O L D U S T M,e t a l.D e r-m o s c o p y-a s i m p l e a n d r a p i d i n v i v o d i a g n o s t i c t e c h n i q u e f o r t i n e a i n o o n i t o[J].A n B r a s D e r m a t o l,2019,94(5):612-614.[8]H A Y R J.T i n e a c a p i t i s:c u r r e n t s t a t u s[J].M y c o p a t h o l o-g i a,2017,182(1-2):87-93.[9]T A N G J,R A N Y.P o l a r i z e d a n d u l t r a v i o l e t d e r m o s c o p y f o rt h e d i a g n o s i s o f d e r m a t o p h y t o s i s o f v e l l u s h a i r[J].I n d i a n JD e r m a t o l V e n e r e o l L e p r o l,2020,86(5):607.[10]宋歌,梁官钊,张美洁,等.毳毛癣的临床诊治进展[J].中华皮肤科杂志,2021,54(8):3.[收稿日期]2022-11-28[本文编辑]陈雪红。

LTE R12 协议 36212

3GPP TS 36.212 V12.0.0 (2013-12)Technical Specification3rd Generation Partnership Project;Technical Specification Group Radio Access Network;Evolved Universal Terrestrial Radio Access (E-UTRA);Multiplexing and channel coding(Release 12)The present document has been developed within the 3rd Generation Partnership Project (3GPP TM) and may be further elaborated for the purposes of 3GPP. The present document has not been subject to any approval process by the 3GPP Organizational Partners and shall not be implemented.This Specification is provided for future development work within 3GPP only. The Organizational Partners accept no liability for any use of this Specification. Specifications and reports for implementation of the 3GPP TM system should be obtained via the 3GPP Organizational Partners‟ Publications Offices.KeywordsUMTS, radio, Layer 13GPPPostal address3GPP support office address650 Route des Lucioles – Sophia AntipolisValbonne – FranceTel. : +33 4 92 94 42 00 Fax : +33 4 93 65 47 16InternetCopyright NotificationNo part may be reproduced except as authorized by written permission.The copyright and the foregoing restriction extend to reproduction in all media.© 2013, 3GPP Organizational Partners (ARIB, ATIS, CCSA, ETSI, TTA, TTC).All rights reserved.UMTS™ is a Trade Mark of ETSI registered for the benefit of its members3GPP™ is a Trade Mark of ETSI registered for the benefit of its Members and of the 3GPP Organizational Partners LTE™ is a Trade Mark of ETSI registered for the benefit of its Members and o f the 3GPP Organizational Partners GSM® and the GSM logo are registered and owned by the GSM AssociationContentsForeword (5)1Scope (6)2References (6)3Definitions, symbols and abbreviations (6)3.1 Definitions (6)3.2Symbols (6)3.3 Abbreviations (7)4Mapping to physical channels (7)4.1Uplink (7)4.2Downlink (8)5Channel coding, multiplexing and interleaving (8)5.1Generic procedures (8)5.1.1CRC calculation (8)5.1.2Code block segmentation and code block CRC attachment (9)5.1.3Channel coding (11)5.1.3.1Tail biting convolutional coding (11)5.1.3.2Turbo coding (12)5.1.3.2.1Turbo encoder (12)5.1.3.2.2Trellis termination for turbo encoder (13)5.1.3.2.3Turbo code internal interleaver (13)5.1.4Rate matching (15)5.1.4.1Rate matching for turbo coded transport channels (15)5.1.4.1.1Sub-block interleaver (15)5.1.4.1.2Bit collection, selection and transmission (16)5.1.4.2Rate matching for convolutionally coded transport channels and control information (18)5.1.4.2.1Sub-block interleaver (19)5.1.4.2.2Bit collection, selection and transmission (20)5.1.5Code block concatenation (20)5.2Uplink transport channels and control information (21)5.2.1Random access channel (21)5.2.2Uplink shared channel (21)5.2.2.1Transport block CRC attachment (22)5.2.2.2Code block segmentation and code block CRC attachment (22)5.2.2.3Channel coding of UL-SCH (23)5.2.2.4Rate matching (23)5.2.2.5Code block concatenation (23)5.2.2.6 Channel coding of control information (23)5.2.2.6.1Channel quality information formats for wideband CQI reports (33)5.2.2.6.2Channel quality information formats for higher layer configured subband CQI reports (34)5.2.2.6.3Channel quality information formats for UE selected subband CQI reports (37)5.2.2.6.4Channel coding for CQI/PMI information in PUSCH (39)5.2.2.6.5Channel coding for more than 11 bits of HARQ-ACK information (40)5.2.2.7 Data and control multiplexing (41)5.2.2.8 Channel interleaver (42)5.2.3Uplink control information on PUCCH (44)5.2.3.1Channel coding for UCI HARQ-ACK (44)5.2.3.2Channel coding for UCI scheduling request (49)5.2.3.3Channel coding for UCI channel quality information (49)5.2.3.3.1Channel quality information formats for wideband reports (49)5.2.3.3.2Channel quality information formats for UE-selected sub-band reports (52)5.2.3.4Channel coding for UCI channel quality information and HARQ-ACK (56)5.2.4Uplink control information on PUSCH without UL-SCH data (56)5.2.4.1 Channel coding of control information (57)5.2.4.2 Control information mapping (57)5.2.4.3 Channel interleaver (58)5.3Downlink transport channels and control information (58)5.3.1Broadcast channel (58)5.3.1.1Transport block CRC attachment (58)5.3.1.2Channel coding (59)5.3.1.3 Rate matching (59)5.3.2Downlink shared channel, Paging channel and Multicast channel (59)5.3.2.1Transport block CRC attachment (60)5.3.2.2Code block segmentation and code block CRC attachment (60)5.3.2.3Channel coding (61)5.3.2.4Rate matching (61)5.3.2.5Code block concatenation (61)5.3.3Downlink control information (61)5.3.3.1DCI formats (62)5.3.3.1.1Format 0 (62)5.3.3.1.2Format 1 (63)5.3.3.1.3Format 1A (64)5.3.3.1.3A Format 1B (66)5.3.3.1.4Format 1C (68)5.3.3.1.4A Format 1D (68)5.3.3.1.5Format 2 (70)5.3.3.1.5A Format 2A (73)5.3.3.1.5B Format 2B (75)5.3.3.1.5C Format 2C (76)5.3.3.1.5D Format 2D (78)5.3.3.1.6Format 3 (79)5.3.3.1.7Format 3A (79)5.3.3.1.8Format 4 (80)5.3.3.2CRC attachment (81)5.3.3.3Channel coding (82)5.3.3.4Rate matching (82)5.3.4Control format indicator (82)5.3.4.1Channel coding (83)5.3.5HARQ indicator (HI) (83)5.3.5.1Channel coding (83)Annex A (informative): Change history (85)ForewordThis Technical Specification has been produced by the 3rd Generation Partnership Project (3GPP).The contents of the present document are subject to continuing work within the TSG and may change following formal TSG approval. Should the TSG modify the contents of the present document, it will be re-released by the TSG with an identifying change of release date and an increase in version number as follows:Version x.y.zwhere:x the first digit:1 presented to TSG for information;2 presented to TSG for approval;3 or greater indicates TSG approved document under change control.Y the second digit is incremented for all changes of substance, i.e. technical enhancements, corrections, updates, etc.z the third digit is incremented when editorial only changes have been incorporated in the document.1 ScopeThe present document specifies the coding, multiplexing and mapping to physical channels for E-UTRA.2 ReferencesThe following documents contain provisions which, through reference in this text, constitute provisions of the present document.∙References are either specific (identified by date of publication, edition number, version number, etc.) or non-specific.∙For a specific reference, subsequent revisions do not apply.∙For a non-specific reference, the latest version applies. In the case of a reference to a 3GPP document (includinga GSM document), a non-specific reference implicitly refers to the latest version of that document in the sameRelease as the present document.[1] 3GPP TR 21.905: "Vocabulary for 3GPP Specifications".[2] 3GPP TS 36.211: "Evolved Universal Terrestrial Radio Access (E-UTRA); Physical channels andmodulation".[3] 3GPP TS 36.213: "Evolved Universal Terrestrial Radio Access (E-UTRA); Physical layerprocedures".[4] 3GPP TS 36.306: "Evolved Universal Terrestrial Radio Access (E-UTRA); User Equipment (UE)radio access capabilities".[5] 3GPP TS36.321, “Evolved Universal Terrestrial Radio Access (E-UTRA); Medium AccessControl (MAC) protocol specification”[6] 3GPP TS36.331, “Evolved Universal Terrestrial Radio Access (E-UTRA); Radio ResourceControl (RRC) proto col specification”3 Definitions, symbols and abbreviations3.1 DefinitionsFor the purposes of the present document, the terms and definitions given in [1] and the following apply. A term defined in the present document takes precedence over the definition of the same term, if any, in [1].Definition format<defined term>: <definition>.3.2 SymbolsFor the purposes of the present document, the following symbols apply:DLN Downlink bandwidth configuration, expressed in number of resource blocks [2] RBULN Uplink bandwidth configuration, expressed in number of resource blocks [2] RBRBN Resource block size in the frequency domain, expressed as a number of subcarriers scPUSCHN Number of SC-FDMA symbols carrying PUSCH in a subframesym b-PUSCHinitialN Number of SC-FDMA symbols carrying PUSCH in the initial PUSCH transmission subframe symbULN Number of SC-FDMA symbols in an uplink slotsymbN Number of SC-FDMA symbols used for SRS transmission in a subframe (0 or 1).SRS3.3 AbbreviationsFor the purposes of the present document, the following abbreviations apply:BCH Broadcast channelCFI Control Format IndicatorCP Cyclic PrefixCSI Channel State InformationDCI Downlink Control InformationDL-SCH Downlink Shared channelEPDCCH Enhanced Physical Downlink Control channelFDD Frequency Division DuplexingHI HARQ indicatorMCH Multicast channelPBCH Physical Broadcast channelPCFICH Physical Control Format Indicator channelPCH Paging channelPDCCH Physical Downlink Control channelPDSCH Physical Downlink Shared channelPHICH Physical HARQ indicator channelPMCH Physical Multicast channelPMI Precoding Matrix IndicatorPRACH Physical Random Access channelPUCCH Physical Uplink Control channelPUSCH Physical Uplink Shared channelRACH Random Access channelRI Rank IndicationSR Scheduling RequestSRS Sounding Reference SignalTDD Time Division DuplexingTPMI Transmitted Precoding Matrix IndicatorUCI U plink Control InformationUL-SCH Uplink Shared channel4 Mapping to physical channels4.1 UplinkTable 4.1-1 specifies the mapping of the uplink transport channels to their corresponding physical channels. Table 4.1-2 specifies the mapping of the uplink control channel information to its corresponding physical channel.Table 4.1-1Table 4.1-24.2 DownlinkTable 4.2-1 specifies the mapping of the downlink transport channels to their corresponding physical channels. Table4.2-2 specifies the mapping of the downlink control channel information to its corresponding physical channel.Table 4.2-1Table 4.2-25 Channel coding, multiplexing and interleavingData and control streams from/to MAC layer are encoded /decoded to offer transport and control services over the radio transmission link. Channel coding scheme is a combination of error detection, error correcting, rate matching, interleaving and transport channel or control information mapping onto/splitting from physical channels.5.1Generic procedures This section contains coding procedures which are used for more than one transport channel or control information type.5.1.1 CRC calculation Denote the input bits to the CRC computation by 13210,...,,,,-A a a a a a , and the parity bits by 13210,...,,,,-L p p p p p . A is the size of the input sequence and L is the number of parity bits. The parity bits are generated by one of the following cyclic generator polynomials:- g CRC24A (D ) = [D 24 + D 23 + D 18 + D 17 + D 14 + D 11 + D 10 + D 7 + D 6 + D 5 + D 4 + D 3 + D + 1] and;- g CRC24B (D ) = [D 24 + D 23 + D 6 + D 5 + D + 1] for a CRC length L = 24 and;- g CRC16(D ) = [D 16 + D 12 + D 5 + 1] for a CRC length L = 16.- g CRC8(D ) = [D 8 + D 7 + D 4 + D 3 + D + 1] for a CRC length of L = 8.The encoding is performed in a systematic form, which means that in GF(2), the polynomial:23122221230241221230......p D p D p D p D a D a D a A A A ++++++++-++yields a remainder equal to 0 when divided by the corresponding length-24 CRC generator polynomial, g CRC24A (D ) or g CRC24B (D ), the polynomial:15114141150161141150......p D p D p D p D a D a D a A A A ++++++++-++yields a remainder equal to 0 when divided by g CRC16(D ), and the polynomial:7166170816170......p D p D p D p D a D a D a A A A ++++++++-++yields a remainder equal to 0 when divided by g CRC8(D ).The bits after CRC attachment are denoted by 13210,...,,,,-B b b b b b , where B = A + L . The relation between a k and b k is:k k a b = for k = 0, 1, 2, …, A -1A k k p b -=for k = A , A +1, A +2,..., A +L -1.5.1.2 Code block segmentation and code block CRC attachmentThe input bit sequence to the code block segmentation is denoted by 13210,...,,,,-B b b b b b , where B > 0. If B is larger than the maximum code block size Z , segmentation of the input bit sequence is performed and an additional CRC sequence of L = 24 bits is attached to each code block. The maximum code block size is:- Z = 6144.If the number of filler bits F calculated below is not 0, filler bits are added to the beginning of the first block.Note that if B < 40, filler bits are added to the beginning of the code block.The filler bits shall be set to <NULL > at the input to the encoder.Total number of code blocks C is determined by:if Z B ≤L = 0Number of code blocks: 1=C B B ='elseL = 24Number of code blocks: ()⎡⎤L Z B C -=/. L C B B ⋅+='end ifThe bits output from code block segmentation, for C ≠ 0, are denoted by ()13210,...,,,,-r K r r r r r c c c c c , where r is the code block number, and K r is the number of bits for the code block number r .Number of bits in each code block (applicable for C ≠ 0 only):First segmentation size: +K = minimum K in table 5.1.3-3 such that B K C '≥⋅if 1=Cthe number of code blocks with length +K is +C =1, 0=-K , 0=-Celse if 1>CSecond segmentation size: -K = maximum K in table 5.1.3-3 such that +<K K -+-=∆K K KNumber of segments of size -K : ⎥⎦⎥⎢⎣⎢∆'-⋅=+-K B K C C . Number of segments of size +K : -+-=C C C .end ifNumber of filler bits: B K C K C F '-⋅+⋅=--++for k = 0 to F -1-- Insertion of filler bits >=<NULL c k 0end fork = Fs = 0for r = 0 to C -1if -<C r-=K K relse+=K K rend ifwhile L K k r -<s rk b c =1+=k k1+=s s end whileif C >1The sequence ()13210,...,,,,--L K r r r r r r c c c c c is used to calculate the CRC parity bits ()1210,...,,,-L r r r r p p p paccording to section 5.1.1 with the generator polynomial g CRC24B (D ). For CRC calculation it isassumed that filler bits, if present, have the value 0.while r K k <)(r K L k r rk p c -+=1+=k kend whileend if 0=kend for5.1.3 Channel codingThe bit sequence input for a given code block to channel coding is denoted by 13210,...,,,,-K c c c c c , where K is thenumber of bits to encode. After encoding the bits are denoted by )(1)(3)(2)(1)(0,...,,,,i D i i i i d d d d d -, where D is the number of encoded bits per output stream and i indexes the encoder output stream. The relation between k c and )(i k d and betweenK and D is dependent on the channel coding scheme.The following channel coding schemes can be applied to TrCHs: - tail biting convolutional coding; - turbo coding.Usage of coding scheme and coding rate for the different types of TrCH is shown in table 5.1.3-1. Usage of coding scheme and coding rate for the different control information types is shown in table 5.1.3-2. The values of D in connection with each coding scheme: - tail biting convolutional coding with rate 1/3: D = K ; - turbo coding with rate 1/3: D = K + 4.The range for the output stream index i is 0, 1 and 2 for both coding schemes.Table 5.1.3-1: Usage of channel coding scheme and coding rate for TrCHs.Table 5.1.3-2: Usage of channel coding scheme and coding rate for control information.5.1.3.1 Tail biting convolutional codingA tail biting convolutional code with constraint length 7 and coding rate 1/3 is defined. The configuration of the convolutional encoder is presented in figure 5.1.3-1.The initial value of the shift register of the encoder shall be set to the values corresponding to the last 6 information bits in the input stream so that the initial and final states of the shift register are the same. Therefore, denoting the shift register of the encoder by 5210,...,,,s s s s , then the initial value of the shift register shall be set to()i K i c s --=10 = 133 (octal)1 = 171 (octal)2 = 165 (octal)Figure 5.1.3-1: Rate 1/3 tail biting convolutional encoder.The encoder output streams )0(k d , )1(k d and )2(k d correspond to the first, second and third parity streams, respectively asshown in Figure 5.1.3-1.5.1.3.2Turbo coding5.1.3.2.1Turbo encoderThe scheme of turbo encoder is a Parallel Concatenated Convolutional Code (PCCC) with two 8-state constituent encoders and one turbo code internal interleaver. The coding rate of turbo encoder is 1/3. The structure of turbo encoder is illustrated in figure 5.1.3-2.The transfer function of the 8-state constituent code for the PCCC is: G (D ) = ⎥⎦⎤⎢⎣⎡)()(,101D g D g ,whereg 0(D ) = 1 + D 2 + D 3,g 1(D ) = 1 + D + D 3.The initial value of the shift registers of the 8-state constituent encoders shall be all zeros when starting to encode the input bits.The output from the turbo encoder isk k x d =)0( k k z d =)1( k k z d '=)2(for 1,...,2,1,0-=K k .If the code block to be encoded is the 0-th code block and the number of filler bits is greater than zero, i.e., F > 0, thenthe encoder shall set c k , = 0, k = 0,…,(F -1) at its input and shall set >=<NULL d k )0(, k = 0,…,(F -1) and >=<NULL d k )1(, k = 0,…,(F -1) at its output.The bits input to the turbo encoder are denoted by 13210,...,,,,-K c c c c c , and the bits output from the first and second 8-state constituent encoders are denoted by 13210,...,,,,-K z z z z z and 13210,...,,,,-'''''K z z z z z , respectively. The bits outputfrom the turbo code internal interleaver are denoted by 110,...,,-'''K c c c , and these bits are to be the input to the second 8-state constituent encoder.Figure 5.1.3-2: Structure of rate 1/3 turbo encoder (dotted lines apply for trellis termination only).5.1.3.2.2 Trellis termination for turbo encoderTrellis termination is performed by taking the tail bits from the shift register feedback after all information bits areencoded. Tail bits are padded after the encoding of information bits.The first three tail bits shall be used to terminate the first constituent encoder (upper switch of figure 5.1.3-2 in lower position) while the second constituent encoder is disabled. The last three tail bits shall be used to terminate the second constituent encoder (lower switch of figure 5.1.3-2 in lower position) while the first constituent encoder is disabled. The transmitted bits for trellis termination shall then be:K K x d =)0(, 1)0(1++=K K z d , K K x d '=+)0(2, 1)0(3++'=K K z d K K z d =)1(, 2)1(1++=K K x d , K K z d '=+)1(2, 2)1(3++'=K K x d1)2(+=K K x d , 2)2(1++=K K z d , 1)2(2++'=K K x d , 2)2(3++'=K K z d5.1.3.2.3 Turbo code internal interleaverThe bits input to the turbo code internal interleaver are denoted by 110,...,,-K c c c , where K is the number of input bits.The bits output from the turbo code internal interleaver are denoted by 110,...,,-'''K c c c . The relationship between the input and output bits is as follows:()i i c c ∏=', i =0, 1,…, (K -1)where the relationship between the output index i and the input index )(i ∏ satisfies the following quadratic form:()K i f i f i mod )(221⋅+⋅=∏The parameters 1f and 2f depend on the block size K and are summarized in Table 5.1.3-3.Table 5.1.3-3: Turbo code internal interleaver parameters.5.1.4Rate matching5.1.4.1Rate matching for turbo coded transport channelsThe rate matching for turbo coded transport channels is defined per coded block and consists of interleaving the threeinformation bit streams )0(k d , )1(k d and )2(k d , followed by the collection of bits and the generation of a circular buffer asdepicted in Figure 5.1.4-1. The output bits for each code block are transmitted as described in section 5.1.4.1.2.Figure 5.1.4-1. Rate matching for turbo coded transport channels.The bit stream )0(k d is interleaved according to the sub-block interleaver defined in section 5.1.4.1.1 with an output sequence defined as )0(1)0(2)0(1)0(0,...,,,-∏K v v v v and where ∏K is defined in section 5.1.4.1.1.The bit stream )1(k d is interleaved according to the sub-block interleaver defined in section 5.1.4.1.1 with an output sequence defined as )1(1)1(2)1(1)1(0,...,,,-∏K v v v v .The bit stream )2(k d is interleaved according to the sub-block interleaver defined in section 5.1.4.1.1 with an output sequence defined as )2(1)2(2)2(1)2(0,...,,,-∏K v v v v .The sequence of bits k e for transmission is generated according to section 5.1.4.1.2.5.1.4.1.1 Sub-block interleaverThe bits input to the block interleaver are denoted by )(1)(2)(1)(0,...,,,i D i i i d d d d -, where D is the number of bits. The output bit sequence from the block interleaver is derived as follows:(1) Assign 32=TCsubblockC to be the number of columns of the matrix. The columns of the matrix are numbered 0, 1, 2,…,1-TCsubblockC from left to right. (2) Determine the number of rows of the matrix TCsubblock R , by finding minimum integer TCsubblock R such that:()TCsubblock TC subblock C R D ⨯≤The rows of rectangular matrix are numbered 0, 1, 2,…,1-TCsubblockR from top to bottom.(3) If ()D C R TC subblock TC subblock >⨯, then ()D C R N TCsubblock TC subblock D -⨯= dummy bits are padded such that y k = <NULL > for k = 0, 1,…, N D - 1. Then, )(i k k N d y D =+, k = 0, 1,…, D -1, and the bit sequence y k is written intothe ()TC subblockTC subblock C R ⨯ matrix row by row starting with bit y 0 in column 0 of row 0: ⎥⎥⎥⎥⎥⎦⎤⎢⎢⎢⎢⎢⎣⎡-⨯+⨯-+⨯-⨯--++-)1(2)1(1)1()1(12211210TCsubblock TC subblock TCsubblock TCsubblock TCsubblock TCsubblock TCsubblockTC subblock TCsubblock TCsubblock TCsubblock TCsubblock TCsubblock C R C R C R C R C C C C C y y y y y y y y y y y yFor )0(k d and )1(k d :(4) Perform the inter-column permutation for the matrix based on the pattern (){}1,...,1,0-∈TCsubblock C j j P that is shown intable 5.1.4-1, where P(j ) is the original column position of the j -th permuted column. After permutation of thecolumns, the inter-column permuted ()TCsubblockTC subblock C R ⨯ matrix is equal to ⎥⎥⎥⎥⎥⎦⎤⎢⎢⎢⎢⎢⎣⎡⨯-+-⨯-+⨯-+⨯-++-+++-TCsubblockTC subblock TCsubblock TCsubblockTCsubblock TCsubblockTCsubblock TCsubblock TC subblock TCsubblockTCsubblock TCsubblockTCsubblockTCsubblock TC subblock C R C P C R P C R P C R P C C P C P C P C P C P P P P y y y y y y y y y y y y )1()1()1()2()1()1()1()0()1()2()1()0()1()2()1()0((5) The output of the block interleaver is the bit sequence read out column by column from the inter-columnpermuted ()TCsubblockTC subblock C R ⨯matrix. The bits after sub-block interleaving are denoted by )(1)(2)(1)(0,...,,,i K i i i v v v v -∏,where )(0i v corresponds to )0(P y ,)(1i v to TC subblockC P y +)0(… and ()TCsubblock TC subblock C R K ⨯=∏.For )2(k d :(4) The output of the sub-block interleaver is denoted by )2(1)2(2)2(1)2(0,...,,,-∏K v v v v , where )()2(k ky v π= and where ()∏⎪⎪⎭⎫ ⎝⎛+⨯+⎪⎪⎭⎫ ⎝⎛⎥⎥⎦⎥⎢⎢⎣⎢=K R k C R k P k TC subblock TC subblock TC subblock mod 1mod )(π The permutation function P is defined in Table 5.1.4-1.Table 5.1.4-1 Inter-column permutation pattern for sub-block interleaver.5.1.4.1.2 Bit collection, selection and transmissionThe circular buffer of length ∏=K K w 3 for the r -th coded block is generated as follows: )0(k k v w =for k = 0,…, 1-∏K)1(2k k K v w =+∏ for k = 0,…, 1-∏K)2(12k k K v w =++∏ for k = 0,…, 1-∏KDenote the soft buffer size for the transport block by N IR bits and the soft buffer size for the r -th code block by N cb bits. The size N cb is obtained as follows, where C is the number of code blocks computed in section 5.1.2: -⎪⎪⎭⎫⎝⎛⎥⎦⎥⎢⎣⎢=w IR cb K C N N ,min for DL-SCH and PCH transport channels- w cb K N = for UL-SCH and MCH transport channelswhere N IR is equal to:()⎥⎥⎦⎥⎢⎢⎣⎢⋅⋅=limit DL_HARQ MIMO ,min M M K K N N C soft IRwhere:If the UE signals ue-Category-v1020, and is configured with transmission mode 9 or transmission mode 10 for the DLcell, N soft is the total number of soft channel bits [4] according to the UE category indicated by ue-Category-v1020 [6]. Otherwise, N soft is the total number of soft channel bits [4] according to the UE category indicated by ue-Category (without suffix) [6]. If N soft = 35982720, K C = 5,elseif N soft = 3654144 and the UE is capable of supporting no more than a maximum of two spatial layers for the DL cell, K C = 2 else K C = 1 End if.K MIMO is equal to 2 if the UE is configured to receive PDSCH transmissions based on transmission modes 3, 4, 8, 9 or 10 as defined in section 7.1 of [3], and is equal to 1 otherwise.If the UE is configured with more than one serving cell and if at least two serving cells have different UL/DLconfigurations, M DL_HARQ is the maximum number of DL HARQ processes as defined in Table 7-1 in [3] for the DL-reference UL/DL configuration of the serving cell. Otherwise, M DL_HARQ is the maximum number of DL HARQ processes as defined in section 7 of [3]. M limit is a constant equal to 8.Denoting by E the rate matching output sequence length for the r -th coded block, and rv idx the redundancy version number for this transmission (rv idx = 0, 1, 2 or 3), the rate matching output bit sequence is k e , k = 0,1,..., 1-E . Define by G the total number of bits available for the transmission of one transport block.Set )m L Q N G G ⋅=' where Q m is equal to 2 for QPSK, 4 for 16QAM and 6 for 64QAM, and where - For transmit diversity: - N L is equal to 2, - Otherwise:- N L is equal to the number of layers a transport block is mapped onto Set C G mod '=γ, where C is the number of code blocks computed in section 5.1.2.if 1--≤γC rset ⎣⎦C G Q N E m L /'⋅⋅= elseset ⎡⎤C G Q N E m L /'⋅⋅=end if Set ⎪⎪⎭⎫ ⎝⎛+⋅⎥⎥⎤⎢⎢⎡⋅⋅=2820idx TC subblock cb TCsubblockrv R N R k , where TC subblock R is the number of rows defined in section 5.1.4.1.1. Set k = 0 and j = 0 while { k < E } if >≠<+NULL w cb N j k m od )(0 cb N j k k w e m od )(0+=k = k +1end if j = j +1end while5.1.4.2Rate matching for convolutionally coded transport channels and control informationThe rate matching for convolutionally coded transport channels and control information consists of interleaving thethree bit streams, )0(k d , )1(k d and )2(k d , followed by the collection of bits and the generation of a circular buffer asdepicted in Figure 5.1.4-2. The output bits are transmitted as described in section 5.1.4.2.2.Figure 5.1.4-2. Rate matching for convolutionally coded transport channels and control information.The bit stream )0(k d is interleaved according to the sub-block interleaver defined in section 5.1.4.2.1 with an output sequence defined as )0(1)0(2)0(1)0(0,...,,,-∏K v v v v and where ∏K is defined in section 5.1.4.2.1.The bit stream )1(k d is interleaved according to the sub-block interleaver defined in section 5.1.4.2.1 with an output sequence defined as )1(1)1(2)1(1)1(0,...,,,-∏K v v v v .。

Research Statement

Research StatementParikshit GopalanMy research focuses on fundamental algebraic problems such as polynomial reconstruction and interpolation arising from various areas of theoretical computer science.My main algorith-mic contributions include theﬁrst algorithm for list-decoding a well-known family of codes called Reed-Muller codes[13],and theﬁrst algorithms for agnostically learning parity functions[3]and decision trees[11]under the uniform distribution.On the complexity-theoretic side,my contribu-tions include the best-known hardness results for reconstructing low-degree multivariate polyno-mials from noisy data[12]and the discovery of a connection between representations of Boolean functions by polynomials and communication complexity[2].1IntroductionMany important recent developments in theoretical computer science,such as probabilistic proof checking,deterministic primality testing and advancements in algorithmic coding theory,share a common feature:the extensive use of techniques from algebra.My research has centered around the application of these methods to problems in Coding theory,Computational learning,Hardness of approximation and Boolean function complexity.While atﬁrst glance,these might seem like four research areas that are not immediately related, there are several beautiful connections between these areas.Perhaps the best illustration of these links is the noisy parity problem where the goal is to recover a parity function from a corrupted set of evaluations.The seminal Goldreich-Levin algorithm solves a version of this problem;this result initiated the study of list-decoding algorithms for error-correcting codes[5].An alternate solution is the Kushilevitz-Mansour algorithm[19],which is a crucial component in algorithms for learning decision trees and DNFs[17].H˚a stad’s ground-breaking work on the hardness of this problem has revolutionized our understanding of inapproximability[16].All these results rely on insights into the Fourier structure of Boolean functions.As I illustrate below,my research has contributed to a better understanding of these connec-tions,and yielded progress on some important open problems in these areas.2Coding TheoryThe broad goal of coding theory is to enable meaningful communication in the presence of noise, by suitably encoding the messages.The natural algorithmic problem associated with this task is that of decoding or recovering the transmitted message from a corrupted encoding.The last twenty years have witnessed a revolution with the discovery of several powerful decoding algo-rithms for well-known families of error-correcting codes.A key role has been played by the notion of list-decoding;a relaxation of the classical decoding problem where we are willing to settle for a small list of candidate transmitted messages rather than insisting on a unique answer.This relaxation allows one to break the classical half the minimum distance barrier for decoding error-correcting codes.We now know powerful list-decoding algorithms for several important code families,these algorithms have also made a huge impact on complexity theory[5,15,23].List-Decoding Reed-Muller Codes:In recent work with Klivans and Zuckerman,we give the ﬁrst such list-decoding algorithm for a well-studied family of codes known as Reed-Muller codes, obtained from low-degree polynomials over theﬁniteﬁeld F2[13].The highlight of this work is that our algorithm is able to tolerate error-rates which are much higher than what is known as the Johnson bound in coding theory.Our results imply new combinatorial bounds on the error-correcting capability of these codes.While Reed-Muller codes have been studied extensively in both coding theory and computer science communities,our result is theﬁrst to show that they are resilient to remarkably high error-rates.Our algorithm is based on a novel view of the Goldreich-Levin algorithm as a reduction from list-decoding to unique-decoding;our view readily extends to polynomials of arbitrary degree over anyﬁeld.Our result complements recent work on the Gowers norm,showing that Reed-Muller codes are testable up to large distances[21].Hardness of Polynomial Reconstruction:In the polynomial reconstruction problem,one is asked to recover a low-degree polynomial from its evaluations at a set of points and some of the values could be incorrect.The reconstruction problem is ubiquitous in both coding theory and computational learning.Both the Noisy parity problem and the Reed-Muller decoding problem are instances of this problem.In joint work with Khot and Saket,we address the complexity of this problem and establish theﬁrst hardness results for multivariate polynomials of arbitrary degree [12].Previously,the only hardness known was for degree1,which follows from the celebrated work of H˚a stad[16].Our work introduces a powerful new algebraic technique called global fold-ing which allows one to bypass a module called consistency testing that is crucial to most hardness results.I believe this technique willﬁnd other applications.Average-Case Hardness of NP:Algorithmic advances in decoding of error-correcting codes have helped us gain a deeper understand of the connections between worst-case and average case complexity[23,24].In recent work with Guruswami,we use this paradigm to explore the average-case complexity of problems in NP against algorithms in P[8].We present theﬁrst hardness ampliﬁcation result in this setting by giving a construction of an error-correcting code where most of the symbols can be recovered correctly from a corrupted codeword by a deterministic algorithm that probes very few locations in the codeword.The novelty of our work is that our decoder is deterministic,whereas previous algorithms for this task were all randomized.3Computational LearningComputational learning aims to understand the algorithmic issues underlying how we learn from examples,and to explore how the complexity of learning is inﬂuenced by factors such as the ability to ask queries and the possibility of incorrect answers.Learning algorithms for a class of concept typically rely on understanding the structure of that concept class,which naturally ties learning to Boolean function complexity.Learning in the presence of noise has several connections to decoding from errors.My work in this area addresses the learnability of basic concept classes such as decision trees,parities and halfspaces.Learning Decision Trees Agnostically:The problem of learning decision trees is one of the central open problems in computational learning.Decision trees are also a popular hypothesis class in practice.In recent work with Kalai and Klivans,we give a query algorithm for learning decision trees with respect to the uniform distribution on inputs in the agnostic model:given black-box access to an arbitrary Boolean function,our algorithmﬁnds a hypothesis that agrees with it on almost as many inputs as the best decision tree[11].Equivalently,we can learn decision trees even when the data is corrupted adversarially;this is theﬁrst polynomial-time algorithm for learning decision trees in a harsh noise model.Previous decision-tree learning algorithms applied only to the noiseless setting.Our algorithm can be viewed as the agnostic analog of theKushilevitz-Mansour algorithm[19].The core of our algorithm is a procedure to implicitly solve a convex optimization problem in high dimensions using approximate gradient projection.The Noisy Parity Problem:The Noisy parity problem has come to be widely regarded as a hard problem.In work with Feldman et al.,we present evidence supporting this belief[3].We show that in the setting of learning from random examples(without queries),several outstanding open problems such as learning juntas,decision trees and DNFs reduce to restricted versions of the problem of learning parities with random noise.Our result shows that in some sense, noisy parity captures the gap between learning from random examples and learning with queries, as it is believed to be hard in the former setting and is known to be easy in the latter.On the positive side,we present theﬁrst non-trivial algorithm for the noisy parity problem under the uniform distribution in the adversarial noise model.Our result shows that somewhat surprisingly, adversarial noise is no harder to handle than random noise.Hardness of Learning Halfspaces:The problem of learning halfspaces is a fundamental prob-lem in computational learning.One could hope to design algorithms that are robust even in the presence of a few incorrectly labeled points.Indeed,such algorithms are known in the setting where the noise is random.In work with Feldman et al.,we show that the setting of adversarial errors might be intractable:given a set of points where99%are correctly labeled by some halfs-pace,it is NP-hard toﬁnd a halfspace that correctly labels even51%of the points[3].4Prime versus Composite problemsMy thesis work focuses on new aspects of an old and famous problem:the difference between primes and composites.Beyond basic problems like primality and factoring,there are many other computational issues that are not yet well understood.For instance,in circuit complexity,we have excellent lower bounds for small-depth circuits with mod2gates,but the same problem for circuits with mod6gates is wide open.Likewise in combinatorics,set systems where sizes of the sets need to satisfy certain modular conditions are well studied.Again the prime case is well understood,but little is known for composites.In all these problems,the algebraic techniques that work well in the prime case break down for composites.Boolean function complexity:Perhaps the simplest class of circuits for which we have been unable to show lower bounds is small-depth circuits with And,Or and Mod m gates where m is composite;indeed this is one of the frontier open problems in circuit complexity.When m is prime, such bounds were proved by Razborov and Smolensky[20,22].One reason for this gap is that we do not fully understand the computational power of polynomials over composites;Barrington et.al were theﬁrst to show that such polynomials are surprisingly powerful[1].In joint work with Bhatnagar and Lipton,we solve an important special case:when the polynomials are symmetric in their variables[2].We show an equivalence between computing Boolean functions by symmetric polynomials over composites and multi-player communication protocols,which enables us to apply techniques from communication complexity and number theory to this problem.We use these techniques to show tight degree bounds for various classes of functions where no bounds were known previously.Our viewpoint simpliﬁes previously known results in this area,and reveals new connections to well-studied questions about Diophantine equations.Explicit Ramsey Graphs:A basic open problem regarding polynomials over composites is: Can asymmetry in the variables help us compute a symmetric function with low degree?I show a connec-tion between this question and an important open problem in combinatorics,which is to explicitly construct Ramsey graphs or graphs with no large cliques and independent sets[6].While good Ramsey graphs are known to exist by probabilistic arguments,explicit constructions have proved elusive.I propose a new algebraic framework for constructing Ramsey graphs and showed howseveral known constructions can all be derived from this framework in a uniﬁed manner.I show that all known constructions rely on symmetric polynomials,and that such constructions cannot yield better Ramsey graphs.Thus the question of symmetry versus asymmetry of variables is precisely the barrier to better constructions by such techniques.Interpolation over Composites:A basic problem in computational algebra is polynomial interpolation,which is to recover a polynomial from its evaluations.Interpolation and related algorithmic tasks which are easy for primes become much harder,even intractable over compos-ites.This difference stems from the fact that over primes,the number of roots of a polynomial is bounded by the degree,but no such theorem holds for composites.In lieu of this theorem I presented an algorithmic bound;I show how to compute a bound on the degree of a polynomial given its zero set[7].I use this to give theﬁrst optimal algorithms for interpolation,learning and zero-testing over composites.These algorithms are based on new structural results about the ze-roes of polynomials.These results were subsequently useful in ruling out certain approaches for better Ramsey constructions[6].5Other Research HighlightsMy other research work spans areas of theoretical computer science ranging from algorithms for massive data sets to computational complexity.I highlight some of this work below.Data Stream Algorithms:Algorithmic problems arising from complex networks like the In-ternet typically involve huge volumes of data.This has led to increased interest in highly efﬁcient algorithmic models like sketching and streaming,which can meaningfully deal with such massive data sets.A large body of work on streaming algorithms focuses one estimating how sorted the input is.This is motivated by the realization that sorting the input is intractable in the one-pass data stream model.In joint work with Krauthgamer,Jayram and Kumar,we presented theﬁrst sub-linear space data stream algorithms to estimate two well-studied measures of sortedness:the distance from monotonicity(or Ulam distance for permutations),and the length of the Longest Increasing Subsequence or LIS.In more recent work with Anna G´a l,we prove optimal lower bounds for estimating the length of the LIS in the data-stream model[4].This is established by proving a direct-sum theorem for the communication complexity of a related problem.The novelty of our techniques is the model of communication that they address.As a corollary,we obtain a separation between two models of communication that are commonly studied in relation to data stream algorithms.Structural Properties of SAT solutions:The solution space of random SAT formulae has been studied with a view to better understanding connections between computational hardness and phase transitions from satisﬁable to unsatisﬁable.Recent algorithmic approaches rely on connectivity properties of the space and break down in the absence of connectivity.In joint work with Kolaitis,Maneva and Papadimitriou,we consider the problem:Given a Boolean formula,do its solutions form a connected subset of the hypercube?We classify the worst-case complexity of various connectivity properties of the solution space of SAT formulae in Schaefer’s framework[14].We show that the jump in the computational hardness is accompanied by a jump in the diameter of the solution space from linear to exponential.Complexity of Modular Counting Problems:In joint work with Guruswami and Lipton,we address the complexity of counting the roots of a multivariate polynomial over aﬁniteﬁeld F q modulo some number r[9].We establish a dichotomy showing that the problem is easy when r is a power of the characteristic of theﬁeld and intractable otherwise.Our results give several examples of problems whose decision versions are easy,but the modular counting version is hard.6Future Research DirectionsMy broad research goal is to gain a complete understanding of the complexity of problems arising in coding theory,computational learning and related areas;I believe that the right tools for this will come from Boolean function complexity and hardness of approximation.Below I outline some of the research directions I would like to pursue in the future.List-decoding algorithms have allowed us to break the unique-decoding barrier for error-correcting codes.It is natural to ask if one can perhaps go beyond the list-decoding radius and solve the problem ofﬁnding the codeword nearest to a received word at even higher error rates. On the negative side,we do not currently know any examples of codes where one can do this.But I think that recent results on Reed-Muller codes do offer some hope[13,21].Algorithms for solving the nearest codeword problem if they exist,could also have exciting implications in computational learning.There are concept classes which are well-approximated by low-degree polynomials over ﬁniteﬁelds lying just beyond the threshold of what is currently known to be learnable efﬁciently [20,22].Decoding algorithms for Reed-Muller codes that can tolerate very high error rates might present an approach to learning such concept classes.One of the challenges in algorithmic coding theory is to determine whether known algorithms for list-decoding Reed-Solomon codes[15]and Reed-Muller codes[13,23]are optimal.This raises both computational and combinatorial questions.I believe that my work with Khot et al.rep-resents a goodﬁrst step towards understanding the complexity of the decoding/reconstruction problem for multivariate polynomials.Proving similar results for univariate polynomials is an excellent challenge which seems to require new ideas in hardness of approximation.There is a large body of work proving strong NP-hardness results for problems in computa-tional learning.However,all such results only address the proper learning scenario where the learning algorithm is restricted to produce a hypothesis from some particular class H which is typically the same as the concept class C.In contrast,known learning algorithms are mostly im-proper algorithms which could use more complicated hypotheses.For hardness results that are independent of the hypothesis H used by the algorithm,one currently has to resort to crypto-graphic assumptions.In ongoing work with Guruswami and Raghavendra,we are investigating the possibility of proving NP-hardness for improper learning.Finally,I believe that there are several interesting directions to explore in the agnostic learn-ing model.An exciting insight in this area comes from the work of Kalai et al.who show that 1regression is a powerful tool for noise-tolerant learning[18].A powerful paradigm in com-putational learning is to prove that the concept has some kind of polynomial approximation and then recover the approximation.Algorithms based on 1regression require a weaker polynomial approximation in comparison with previous algorithms(which use 2regression),but use more powerful machinery for the recovery step.Similar ideas might allow us to extend the boundaries of efﬁcient learning even in the noiseless model;this is a possibility I am currently exploring.Having worked in areas ranging from data stream algorithms to Boolean function complexity, I view myself as both an algorithm designer and a complexity theorist.I have often found that working on one aspect of a problem gives insights into the other;indeed much of my work has originated from such insights([12]and[13],[10]and[4],[6]and[7]).Iﬁnd that this is increasingly the case across several areas in theoretical computer science.My aim is to maintain this balance between upper and lower bounds in my future work.References[1]D.A.Barrington,R.Beigel,and S.Rudich.Representing Boolean functions as polynomialsmodulo composite putational Complexity,4:367–382,1994.[2]N.Bhatnagar,P.Gopalan,and R.J.Lipton.Symmetric polynomials over Z m and simultane-ous communication protocols.Journal of Computer&System Sciences(special issue for FOCS’03), 72(2):450–459,2003.[3]V.Feldman,P.Gopalan,S.Khot,and A.K.Ponnuswami.New results for learning noisyparities and halfspaces.In Proc.47th IEEE Symp.on Foundations of Computer Science(FOCS’06), 2006.[4]A.G´a l and P.Gopalan.Lower bounds on streaming algorithms for approximating the lengthof the longest increasing subsequence.In Proc.48th IEEE Symp.on Foundations of Computer Science(FOCS’07),2007.[5]O.Goldreich and L.Levin.A hard-core predicate for all one-way functions.In Proc.21st ACMSymposium on the Theory of Computing(STOC’89),pages25–32,1989.[6]P.Gopalan.Constructing Ramsey graphs from Boolean function representations.In Proc.21stIEEE symposium on Computational Complexity(CCC’06),2006.[7]P.Gopalan.Query-efﬁcient algorithms for polynomial interpolation over composites.In Proc.17th ACM-SIAM symposium on Discrete algorithms(SODA’06),2006.[8]P.Gopalan and V.Guruswami.Deterministic hardness ampliﬁcation via local GMD decod-ing.Submitted to23rd IEEE Symp.on Computational Complexity(CCC’08),2008.[9]P.Gopalan,V.Guruswami,and R.J.Lipton.Algorithms for modular counting of roots of mul-tivariate polynomials.In tin American Symposium on Theoretical Informatics(LATIN’06), 2006.[10]P.Gopalan,T.S.Jayram,R.Krauthgamer,and R.Kumar.Estimating the sortedness of a datastream.In Proc.18th ACM-SIAM Symposium on Discrete Algorithms(SODA’07),2007.[11]P.Gopalan,A.T.Kalai,and A.R.Klivans.Agnostically learning decision trees.In Proc.40thACM Symp.on Theory of Computing(STOC’08),2008.[12]P.Gopalan,S.Khot,and R.Saket.Hardness of reconstructing multivariate polynomials overﬁniteﬁelds.In Proc.48th IEEE Symp.on Foundations of Computer Science(FOCS’07),2007. [13]P.Gopalan,A.R.Klivans,and D.Zuckerman.List-decoding Reed-Muller codes over smallﬁelds.In Proc.40th ACM Symp.on Theory of Computing(STOC’08),2008.[14]P.Gopalan,P.G.Kolaitis,E.N.Maneva,and puting the connec-tivity properties of the satisﬁability solution space.In Proc.33rd Intl.Colloqium on Automata, Languages and Programming(ICALP’06),2006.[15]V.Guruswami and M.Sudan.Improved decoding of Reed-Solomon and Algebraic-Geometric codes.IEEE Transactions on Information Theory,45(6):1757–1767,1999.[16]J.H˚a stad.Some optimal inapproximability results.J.ACM,48(4):798–859,2001.[17]J.Jackson.An efﬁcient membership-query algorithm for learning DNF with respect to theuniform distribution.Journal of Computer and System Sciences,55:414–440,1997.[18]A.T.Kalai,A.R.Klivans,Y.Mansour,and R.A.Servedio.Agnostically learning halfspaces.In Proc.46th IEEE Symp.on Foundations of Computer Science,pages11–20,2005.[19]E.Kushilevitz and Y.Mansour.Learning decision trees using the Fourier spectrum.SIAMJournal of Computing,22(6):1331–1348,1993.[20]A.Razborov.Lower bounds for the size of circuits of bounded depth with basis{∧,⊕}.Mathematical Notes of the Academy of Science of the USSR,(41):333–338,1987.[21]A.Samorodnitsky.Low-degree tests at large distances.In Proc.39th ACM Symposium on theTheory of Computing(STOC’07),pages506–515,2007.[22]R.Smolensky.Algebraic methods in the theory of lower bounds for Boolean circuit com-plexity.Proc.19th Annual ACM Symposium on Theoretical Computer Science,(STOC’87),pages 77–82,1987.[23]M.Sudan,L.Trevisan,and S.P.Vadhan.Pseudorandom generators without the XOR lemma.put.Syst.Sci.,62(2):236–266,2001.[24]L.Trevisan.List-decoding using the XOR lemma.In Proc.44th IEEE Symposium on Foundationsof Computer Science(FOCS’03),pages126–135,2003.。

操作系统名词解释整理

==================================名词解释======================================Operating system: operating system is a program that manages the computer hardware. The operating system is the one program running at all times on the computer (usually called the kernel), with all else being systems programs and application programs.操作系统：操作系统一个管理计算机硬件的程序，他一直运行着，管理着各种系统资源Multiprogramming: Multiprogramming is one of the most important aspects of operating systems. Multiprogramming increases CPU utilization by organizing jobs (code and data) so that the CPU always has one to execute.多程序设计：是操作系统中最重要的部分之一，通过组织工作提高CPU利用率，保证了CPU始终在运行中。

batch system: A batch system is one in which jobs are bundled together with the instructions necessary to allow them to be processed without intervention.批处理系统：将许多工作和指令捆绑在一起运行，使得它们不必等待插入，以此提高系统效率。

存储HCIP测试试题及答案

存储HCIP测试试题及答案1、关于华为 Oceanstor 9000 各种类型节点下硬盘的性能,由高到低的排序正确的是哪一项?A、P25 Node SSD 硬盘-〉P25 Node SAS 硬盘-〉P12 SATA 硬盘-〉P36 Node SATA 硬盘-〉C36 SATA 硬盘B、P25 Node SSD 硬盘-〉P25 Node SAS 硬盘-〉P12 SATA 硬盘-〉C36 SATA 硬盘C、P25 Node SSD 硬盘-〉P25 Node SAS 硬盘-〉P36 Node SATA 硬盘-〉P12 SATA 硬盘-〉C36 SATA 硬盘D、P25 Node SSD 硬盘-〉P25 Node SAS 硬盘-〉P36 Node SATA 硬盘-〉C36 SATA 硬盘-〉P12 SATA 硬盘答案：C2、下面不属于华为存储高危命令的时哪个命令？A、import licenseB、export licenseC、poweroff diskD、import configuration_data答案：B3、Erasure code 冗余技术支持比传统 RAID 算法更高的可靠性和更灵活的冗余策略,以下对 Erasure code 原理描述错误的是哪一项?A、数据写入时被切割成 N 个数据块,大小相同B、每 N 个连续数据块通过 ErasureCode 算法计算出 M 个校验块C、系统将 N+M 个数据块并行的存储于不同的节点中D、在 Erasure Code 存储模式下,系统最多支持 M-1 块硬盘答案：D4、下列关于 InfoRevive 描述正确的是:A、OceanStor 9000 提供的 InfoRevive 特性支持在故障节点/磁盘超出冗余上限的情况下,对视频监控业务的连续型毫无影响。

B、开启读容错模式后,当系统中出现故障超出配置的冗余上限时,可抢救读出所有已经损坏的视频文件数据。

calligraphy 课后复习

Chinese calligraphyChinese calligraphy (Brush calligraphy) is an art unique to Asian cultures. Shu (calligraphy), Hua (painting), Qin (a string musical instrument), and Qi (a strategic boardgame) are the four basic skills and disciplines of the Chinese intellectuals and poets, and are regarded as the most abstract and sublime form of art in Chinese culture.To understand calligraphy and painting, first let's get to know the commom tools these two art forms share, namely, the Four Treasures of the Studywriting brushink stickpaperink stoneThe best of each of these items is represented by: Hu Brush (湖笔), Hu ink stick (徽墨), Xuan paper (宣纸), and Duan ink stone (端砚).Hu Brush (湖笔),Hu ink stick (徽墨)Xuan paper (宣纸)Duan ink stone (端砚).Calligraphy is the art of writing Chinese characters with the Four Treasures of the Study. To understand calligraphy, one must first know something about Chinese charactersThe development of Chinese Character and CalligraphyIn ancient China, the oldest Chinese characters existing are oracle bone scripts carved on ox scapula and tortoise plastrons,because the dominators in Shang Dynasty carved pits on such animals's bones and then baked them to gain auspice of military affairs, agricultural harvest, or even procreating and weather, etc. During the divination ceremony, after the cracks were made, the characters were written with a brush on the shell or bone to be later carved. With the development of Jīnwén (Bronzeware script) and Dàzhuàn (Large Seal Script) "cursive" signs continued. Moreover, each archaic kingdom of current China had its own set of characters.指中国商朝晚期王室用于占卜记事而在龟甲或兽骨上契刻的文字。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

A Bottom-up Merging Algorithm for ChineseUnknown Word ExtractionWei-Yun MaInstitute of Information science, Academia Sinicama@.twKeh-Jiann Chen Institute of Information science, Academia Sinicakchen@.twAbstractStatistical methods for extracting Chineseunknown words usually suffer a problemthat superfluous character strings withstrong statistical associations are extractedas well. To solve this problem, this paperproposes to use a set of general morpho-logical rules to broaden the coverage andon the other hand, the rules are appendedwith different linguistic and statisticalconstraints to increase the precision of therepresentation. To disambiguate rule ap-plications and reduce the complexity ofthe rule matching, a bottom-up mergingalgorithm for extraction is proposed,which merges possible morphemes recur-sively by consulting above the generalrules and dynamically decides which ruleshould be applied first according to thepriorities of the rules. Effects of differentpriority strategies are compared in our ex-periment, and experimental results showthat the performance of proposed methodis very promising.1Introduction and Related Work Chinese sentences are strings of characters with no delimiters to mark word boundaries. Therefore the initial step for Chinese processing is word segmentation. However, occurrences of unknown words, which do not listed in the dictionary, degraded significantly the performances of most word segmentation methods, so unknown word extraction became a key technology for Chinese segmentation.For unknown words with more regular morphological structures, such as personal names, morphological rules are commonly used for improving the performance by restricting the structures of extracted words (Chen et. al 1994, Sun et. al 1994, Lin et. al 1994). However, it's not possible to list morphological rules for all kinds of unknown words, especially those words with very irregular structures, which have the characteristics of variable lengths and flexible morphological structures, such as proper names, abbreviations etc. Therefore, statistical approaches usually play major roles on irregular unknown word extraction in most previous work (Sproat & Shih 1990, Chiang et. al 1992, Tung and Lee 1995, Palmer 1997, Chang et. al 1997, Sun et. al 1998, Ge et. al 1999).For statistical methods, an important issue is how to resolve competing ambiguous extractions which might include erroneous extractions of phrases or partial phrases. They might have statistical significance in a corpus as well. Very frequently superfluous character strings with strong statistic associations are extracted. These wrong results are usually hard to be filtered out unless deep content and context analyses were performed. To solve this problem, the idea of unknown word detection procedure prior to extraction is proposed. Lin et al. (1993) adopt the following strategy: First, they decide whether there is any unknown word within a detected region with fix size in a sentence, and then they extract the unknown word from the region by a statistical method if the previous answer is "yes". A limitation of this method is that it restricts at mostone unknown word occurs in the detected region, so that it could not deal with occurrences of consecutive unknown words within a sentence. Chen & Ma (2002) adopt another strategy: After an initial segmentation process, each monosyllable is decided whether it is a common word or a morpheme of unknown word by a set of syntactic discriminators. The syntactic discriminators are a set of syntactic patterns containing monosyllabic, words which are learned from a large word segmented corpus, to discriminate between monosyllabic words and morphemes of unknown words. Then more deep analysis can be carried out at the detected unknown word morphemes to extract unknown words.In this paper, in order to avoid extractions of superfluous character strings with high frequencies, we proposed to use a set of general rules, which is formulated as a context free grammar rules of composing detected morphemes and their adjacent tokens, to match all kinds of unknown words, for instance which includes the rule of (UW UW UW). To avoid too much superfluous extractions caused by the over general rules, rules are appended with linguistic or statistical constraints. To disambiguate between rule applications and reduce the complexity of the rule matching, a bottom-up merging algorithm for extraction is proposed, which merges possible morphemes recursively by consulting above general rules and dynamically decides which rule should be applied first according to the priorities of the rules.The paper is organized into 7 sections. In the next section, we provide an overview of our sys-tem. Section 3 briefly introduce unknown word detection process and makes some analysis for helping the derivation of general rules for un-known words. In section 4, we derive a set of gen-eral rules to represent all kinds of unknown words, and then modify it by appending rules constraints and priorities. In section 5, a bottom-up merging algorithm is presented for unknown word extrac-tion. In section 6, the evaluation of extraction is presented; we also compare the performances to different priority strategies. Finally, in section 7, we make the conclusion and propose some future works.2System OverviewThe purpose to our unknown word extraction system is to online extract all types of unknown words from a Chinese text. Figure 1 illustrates the block diagram of the system proposed in this paper. Initially, the input sentence is segmented by a conventional word segmentation program. As a result, each unknown word in the sentence will be segmented into several adjacent tokens (known words or monosyllabic morphemes). At unknown word detection stage, every monosyllable is decided whether it is a word or an unknown word morpheme by a set of syntactic discriminators, which are learned from a corpus. Afterward, a bottom-up merging process applies the general rules to extract unknown word candidates. Finally, the input text is re-segmented by consulting the system dictionary and the extracted unknown word candidates to get the final segmented result.Figure 1. Flowchart of the system(1)if can increase gross profit rate"if gross profit rate can be increased…" (2) after first step word segmentation:after unknown word detection:(?) (?) (?)after unknown word extraction:For example, the correct segmentation of (1) is shown, but the unknown word ”” is segmented into three monosyllabic words after thefirst step of word segmentation process as shown in (2). The unknown word detection process will mark the sentence as “() () () (?) (?) (?)”, where (?) denotes the detectedmonosyllabic unknown word morpheme and () denotes common words. During extracting process, the rule matching process focuses on the morphemes marked with (?) only and tries to combine them with left/right neighbors according to the rules for unknown words. After that, the unknown word “” is extracted. During the process, we do not need to take care of other superfluous combinations such as “” even though they might have strong statistical association or co-occurrence too.3Analysis of Unknown Word Detection The unknown word detection method proposed by (Chen & Bai 1998) is applied in our system. It adopts a corpus-based learning algorithm to derive a set of syntactic discriminators, which are used to distinguish whether a monosyllable is a word or an unknown word morpheme after an initial segmentation process. If all occurrences of monosyllabic words are considered as morphemes of unknown words, the recall of the detection will be about 99%, but the precision is as low as 13.4%. The basic idea in (Chen & Bai 1998) is that the complementary problem of unknown word detection is the problem of monosyllabic known-word detection, i.e. to remove the monosyllabic known-words as the candidates of unknown word morphemes. Chen and Bai (1998) adopt ten types of context rule patterns, as shown in table 1, to generate rule instances from a training corpus. The generated rule instances were checked for applicability and accuracy. Each rule contains a key token within curly brackets and its contextual tokens without brackets. For some rules there may be no contextual dependencies. The function of each rule means that in a sentence, if a character and its context match the key token and the contextual tokens of the rule respectively, this character is a common word (i.e. not a morpheme of unknown word).For instance, the rule “{Dfa} Vh“ says that a character with syntactic category Dfa is a common word, if it follows a word of syntactic category Vh.Rule type Example=================================char {}word char {}char word {}category {T}{category} category {Dfa} Vhcategory {category} Na {Vcl}char category {} VHcategory char Na {}category category char Na Dfa {}char category category {} Vh T=================================Table1. Rule types and ExamplesThe final rule set contains 45839 rules and were used to detect unknown words in the ex-periment. It achieves a detection rate of 96%, and a precision of 60%. Where detection rate 96% means that for 96% of unknown words in the testing data, at least one of its morphemes are detected as part of unknown word and the precision of 60% means that for 60% of detected monosyllables in the test-ing data, are actually morphemes. Although the precision is not high, most of over-detecting errors are “isolated”, which means there are few situa-tions that two adjacent detected monosyllabic un-known morphemes are both wrong at the mean time. These operative characteristics are very im-portant for helping the design of general rules for unknown words later.4Rules for Unknown WordsAlthough morphological rules work well in regular unknown word extraction, it's difficult to induce morphological rules for irregular unknown words. In this section, we try to represent a common struc-ture for unknown words from another point of view; an unknown word is regarded as the combi-nation of morphemes which are consecutive mor-phemes/words in context after segmentation, most of which are monosyllables. We adopt context free grammar (Chomsky 1956), which is the most commonly used generative grammar for modelling constituent structures, to express our unknown word structure.4.1Rule DerivationAccording to the discussion in section 3, for 96% of unknown words, at least one of its morphemes are detected as part of unknown word, which motivates us to represent the unknown wordstructure with at least one detected morpheme. Taking this phenomenon into our consideration, the rules for modeling unknown words and an unknown word example are presented as follows.UW UW UW (1)| ms(?) ms(?) (2) | ms(?) ps() (3) | ms(?) ms() (4) | ps() ms(?) (5) | ms() ms(?) (6) | ms(?) UW (7) | ms() UW (8) | ps() UW (9) | UW ms(?) (10) | UW ms() (11) | UW ps() (12)Notes: There is one non-terminal symbol. “UW” denotes “unknown word” and is also the start symbol. There are three terminal symbols, which includes ms(?), which denotes the detected monosyllabic unknown word morpheme, ms() , which denotes the monosyllable that is not detected as the morpheme, and ps(), which denotes polysyllabic (more than one syllable) known word.Table 2. General rules for unknown wordsFigure 2. A possible structure for the unknown word“”(Chen Zhi Ming), which is segmented initially and detected as “(?) (?) ()”, and “” was marked incorrectly at detection stage.There are three kinds of commonly used meas-ures applied to evaluate grammars: 1. generality (recall), the range of sentences the grammar ana-lyzes correctly; 2. selectivity (precision), the range of non-sentences it identifies as problematic and 3. understandability, the simplicity of the grammaritself (Allen 1995). For generality, 96% unknown words have this kind of structure, so the grammar has high generality to generate unknown words. But for selectivity, our rules are over-generation. Many patterns accepted by the rules are not words. The main reason is that rules have to include non-detected morphemes for high generality. Therefore selectivity is sacrificed momentary. In next section, rules would be constrained by linguistic and text-based statistical constraints to compensate the se-lectivity of the grammar. For understandability, you can find each rule in (1)-(12) consists of just two right-hand side symbols. The reason for using this kind of presentation is that it regards the un-known word structure as a series of combinations of consecutive two morphemes, such that we could simplify the analysis of unknown word structure by only analyzing its combinations of consecutive two morphemes.4.2 Appending ConstraintsSince the general rules in table 2 have high generality and low selectivity to model unknown words, we append some constraints to restrict their applications. However, there are tradeoffs between generality and selectivity: higher selectivity usually results in lower generality. In order to keep high generality while assigning constraints, we assign different constraints on different rules according to their characteristics, such that it is only degraded generality slightly but selectivity being upgraded significantly.The rules in table 2 are classified into two kinds: one kind is the rules which both its right-hand side symbols consist of detected morphemes, i.e, (1), (2), (7), and (10), the others are the rules that just one of its right-hand side symbols consists of detected morphemes, i.e, (3), (4), (5), (6), (8), (9), (11), and (12). The former is regarded as “strong” structure since they are considered to have more possibility to compose an unknown word or an unknown word morpheme and the latter is regarded as “weak” structure, which means they are considered to have less possibility to compose an unknown word or an unknown word morpheme. The basic idea is to assign more constraint on those rules with weak structure and less constraint on those rules with strong structure.The constraints we applied include word length, linguistic and statistical constraints. For statistical constraints, since the target of our system is toextract unknown words from a text, we use text-based statistical measure as the statistical constraint. It is well known that keywords often reoccur in a document (Church 2000) and very possible the keywords are also unknown words. Therefore the reoccurrence frequency within a document is adopted as the constraint. Another useful statistical phenomenon in a document is that a polysyllabic morpheme is very unlikely to be the morphemes of two different unknown words within the same text. Hence we restrict the rule with polysyllabic symbols by evaluating the conditional probability of polysyllabic symbols. In addition, syntactic constraints are also utilized here. For most of unknown word morphemes, their syntactic categories belong to “bound”, “verb”, ”noun”, and “adjective” instead of “conjunction”, “preposition”…etc. So we restrict the rule with non-detected symbols by checking whether syntactic categories of its non-detected symbols belong to “bound”, “verb”, ”noun”, or “adjective”. To avoid unlimited recursive rule application, the length of matched unknown word is restricted unless very strong statistical association do occur between two matched tokens. The constraints adopted so far are presented in table 3. Rules might be restricted by multi-constraints.Freq docu(LR)>=Threshold (3) (4) (5) (6) (8) (9) (11) (12)P docu(L|R)=1 (1) (3) (7) (8) (9) (12)P docu(R|L)=1 (1) (5) (9) (10) (11) (12) Category(L) is bound, verb,noun or adjective(5) (6) (8) (9)Category(R) is bound, verb,noun or adjective(3) (4) (11) (12)Notes: L denotes left terminal of right-hand sideR denotes right terminal of right-hand sideThreshold is a function of Length(LR) and text size. The basic idea is larger amount of length(LR)or text size matches larger amount of Threshold.Table 3. Constraints for general rules4.3PriorityTo scheduling and ranking ambiguous rule matching, each step of rule matching is associated with a measure of priority which is calculated by the association strength of right-hand side symbols. In our extracting algorithm, the priority measure is used to help extracting process dynamically decide which rule should be derived first. More detail discussion about ambiguity problem and complete disambiguation process are presented in section 5. We regard the possibility of a rule application as co-occurrence and association strength of its right-hand side symbols within a text. In other words, a rule has higher priority of application while its right-hand side symbols are strongly associated with each other, or co-occur frequently in the same text. There have been many statistical measures which estimate co-occurrence and the degree of association in previous researches, such as mutual information (Church 1990, Sporat 1990), t-score (Church 1991), dice matrix (Smadja 1993, 1996). Here, we adopt four well-developed kinds of statistical measures as our priority individually: mutual information (MI), a variant of mutual information (VMI), t-score, and co-occurrence. The formulas are listed in table 4. MI mainly focuses on association strength, and VMI and t-score consider both co-occurrence and association strength. The performances of these four measures are evaluated in our experiments discussed in section 6.====================================),(),(RLfRLoccurenceco=−-------------------------------------------------------------)()(),(log),(RPLPRLPRLMI=-------------------------------------------------------------),(),(),(RLMIRLfRLVMI=-------------------------------------------------------------),()()(),(),(RLfNRfLfRLfRLscoret−=−Notes: f(L,R) denotes the number of occurrences of L,R in the text; N denotes the number of occurrences of all thetokens in the text; length(*) denotes the length of *.==================================== Table 4. Formulas of 4 kinds of priority5Unknown Word Extraction5.1AmbiguityEven though the general rules are appended withwell-designed constraints, ambiguous matchings,such as, overlapping and covering, are still existing.We take the following instance to illustrate that:“” (La Fa Yeh), a warship name, occursfrequently in the text and is segmented anddetected as “(?) (?) (?)”. Although “” could be derived as an unknown word “(())” by rule 2 and rule 10, “” and ””might be also derived as unknown words “()”and “()” individually by the rule 2. Hencethere are total three possible ambiguous unknownwords and only one is actually correct.Several approaches on unsupervised segmenta-tion of Chinese words were proposed to solveoverlapping ambiguity to determine whether togroup “xyz” as “xy z” or “x yz”, where x, y, and zare Chinese characters. Sproat and Shih (1990)adopt a greedy algorithm: group the pair of adja-cent characters with largest mutual informationgreater than some threshold within a sentence, andthe algorithm is applied recursively to the rest ofthe sentence until no character pair satisfies thethreshold. Sun et al. (1998) use various associationmeasures such as t-score besides mutual informa-tion to improve (Sproat & Shih 1990). They devel-oped an efficient algorithm to solve overlappingcharacter pair ambiguity.5.2Bottom-up Merging AlgorithmFollowing the greedy strategy of (Sproat & Shih1990), here we present an efficient bottom-upmerging algorithm consulting the general rules toextract unknown words. The basic idea is that for asegmented sentence, if there are many rule-matched token pairs which also satisfy the ruleconstraints, the token pair with the highest rulepriority within the sentence is merged first andforms a new token string. Same procedure is thenapplied to the updated token string recursivelyuntil no token pair satisfied the general rules. It isillustrated by the following example:======================================System environment:Co-occurrence priority is adopted.Text environment:“” (Chen Zhi Qiang), an unknown word,occurs three times.“” (take an electing activity), an unknownword, occurs two times.“” (Chen Zhi Qiang took an electingactivity), a sentence, occurs one time.Input:After initial segmentation and detection:(?) (?) (?) (?) (?)3 3 1 2 priority After first iteration:(?) (?)(uw)After third iteration:(uw) (uw)===================================== Figure 3. Extraction process of input “”. By the general rules and greedy strategy, besides overlapping character pair ambiguity, the algorithm is able to deal with more complex overlapping and coverage ambiguity, even which result from consecutive unknown words. In finger 3, input sentence “” is derived as the correct two unknown words “(())” and “( )” by rule (2), rule (10), and rule (2) in turn. “” and ”” are not further merged. That is because P(|)<1 violates the constraint of rule (1). Same reason explains why “” and ”” do not satisfy rule (10) in the third iteration.By this simple algorithm, unknown words with unlimited length all have possibilities to be ex-tracted. Observing the extraction process of “”, you can find, in the extraction process, boundaries of unknown words might extend during iteration until no rule could be applied.6ExperimentIn our experiments, a word is considered as an un-known word, if either it is not in the CKIP lexicon or it is not identified by the word segmentation program as foreign word (for instance English) or a number. The CKIP lexicon contains about 80,000 entries.6.1 Evaluation FormulasThe extraction process is evaluated in terms of pre-cision and recall. The target of our approach is to extract unknown words from a text, so we define “correct extractions” as unknown word types cor-rectly extracted in the text. The precision and recall formulas are listed as follows:i document in s extraction correct of number NC i =i document in rds unknown wo extracted of number NE i = i document in rds unknown wo total of number NT i ======1501i1501iNENC rate Precision i i i i=====1501i1501iNTNC rate Recall i i i i6.2 Data SetsWe use the Sinica balanced corpus version 3.0 as our training set for unknown word detection, which contains 5 million segmented words tagged with pos. We randomly select 150 documents of Chi-nese news on the internet as our testing set. These testing data are segmented by hand according to the segmentation standard for information proc-essing designed by the Academia Sinica (Huang et.al 1997). In average, each testing text contains about 300 words and 16.6 unknown word types.6.3 ResultsBased on the four priority measures listed in table 4, the bottom-up merging algorithm is applied. The performances are shown in table 5.Table 5. Experimental results of the four differ-ent priority measuresIn table 5, comparing co-occurrence and MI, we found that the performance of co-occurrence measure is better than MI on both precision and recall. The possible reason is that the characteristic of reoccurrence of unknown words is more impor-tant than morphological association of unknown words while extracting unknown words from a size-limited text. That is because sometimes differ-ent unknown words consist of the same morpheme in a document, and if we use MI as the priority, these unknown words will have low MI values of their morphemes. Even though they have higher frequency, they are still easily sacrificed when they are competed with their adjacent unknown word candidates. This explanation is also proved by the performances of VMI and t-score, which empha-size more importance on co-occurrence in their formulas, are better than the performance of MI. According to above discussions, we adopt co-occurrence as the priority decision making in our unknown word extraction system.In our final system, we adopt morphological rules to extract regular type unknown words and the general rules to extract the remaining irregular unknown words and the total performance is a re-call of 57% and a precision of 76%. An old system of using the morphological rules for names of peo-ple, compounds with prefix or suffix were tested, without using the general rules, having a recall of 25% and a precision of 80%. The general rules improve 32% of the recall and without sacrificing too much of precision.7 Conclusion and Future WorkIn this research, Chinese word segmentation and unknown word extraction has been integrated into a frame work. To increase the coverage of the morphological rules, we first derive a set of gen-eral rules to represent all kinds of unknown words. To avoid extracting superfluous character strings, we then append these rules with linguistic and sta-tistical constraints. We propose an efficient bot-tom-up merging algorithm by consulting the general rules to extract unknown words and using priority measures to resolve the rule matching am-biguities. In the experiment, we compare effects of different priority strategies, and experimental re-sults show that the co-occurrence measure per-formances best.It is found that the performance of unknown word detection would affect the entire performance significantly. Although the performance of un-known word detection is not bad, there is still room for improvement. The possible strategies for im-provement in our future work include using con-textual semantic relations in detection, and some updated statistical methods, such as support vector machine, maximal entropy and so on, to achieve better performance of unknown word detection. References[1] Chen, H.H., & J.C. Lee, 1994,"The Identification ofOrganization Names in Chinese Texts", Communica-tion of COLIPS, Vol.4 No. 2, 131-142.[2] Sun, M. S., C.N. Huang, H.Y. Gao, & Jie Fang,1994, "Identifying Chinese Names in Unrestricted Texts", Communication of COLIPS, Vol.4 No. 2, 113-122[3] Lin, M. Y., T. H. Chiang, & K. Y. Su, 1993,” APreliminary Study on Unknown Word Problem in Chinese Word Segmentation,” Proceedings of ROCLING VI, pp. 119-137[4] Richard Sproat and Chilin Shih, "A StatisticalMethod for Finding Word Boundaries in Chinese Text," Computer Processing of Chinese and Oriental Languages, 4, 336-351, 1990[5] Sun, Maosong, Dayang Shen, and Benjamin K.Tsou. 1998. Chinese Word Segmentation without Us-ing Lexicon and Hand-crafted Training Data. In Proceedings of COLING-ACL ’98,pages 1265-1271 [6] Ge, Xianping, Wanda Pratt, and Padhraic Smyth.1999. Discovering Chinese Words from Unseg-mented Text. In SIGIR ’99, pages 271-272[7] Palmer, David. 1997. A Trainable Rule-based Algo-rithm for Word Segmentation. In Proceedings of the Association for Computational Linguistics[8] Chiang, T. H., M. Y. Lin, & K. Y. Su, 1992,” Statis-tical Models for Word Segmentation and Unknown Word Resolution,” Proceedings of ROCLING V, pp.121-146[9] Chang, Jing-Shin and Keh-Yih Su, 1997a. "An Un-supervised Iterative Method for Chinese New Lexi-con Extraction", to appear in International Journal of Computational Linguistics & Chinese Language Processing, 1997[10] C.H. Tung and H. J. Lee , "Identification of un-known words from corpus," International Journal of Computer Processing of Chinese and Oriental Lan-guages, Vol. 8, Supplement, pp. 131-146, 1995 [11] Chen, K.J. & Wei-Yun Ma, 2002. Unknown WordExtraction for Chinese Documents. In Proceedings of COLING 2002, pages 169-175[12] Chen, K.J. & Ming-Hong Bai, 1998, “UnknownWord Detection for Chinese by a Corpus-basedLearning Method,” international Journal of Computa-tional linguistics and Chinese Language Processing, Vol.3, #1, pp.27-44[13] Church, Kenneth W., 2000, ”Empirical Estimatesof Adaptation: The Chance of Two Noriegas is Closer to p/2 than p*p”, Proceedings of Coling 2000, pp.180-186.][14] Allen James 1995 Natural Language understand-ding. Second Edition, page 44[15] Chen, K.J. & S.H. Liu, 1992,"Word Identificationfor Mandarin Chinese Sentences," Proceedings of 14th Coling, pp. 101-107[16] Huang, C. R. Et al.,1995,"The Introduction ofSinica Corpus," Proceedings of ROCLING VIII, pp.81-89.[17] Huang, C.R., K.J. Chen, & Li-Li Chang, 1997, ”Segmentation Standard for Chinese Natural Language Processing,” International Journal of Com-putational Linguistics and Chinese Language Process-ing, Accepted.[18] Chomsky, N. 1956 Three models for the descrip-tion of language. IRE Transactions on Information Theory, 2, 113-124[19] Church, K. and Hanks, P., “Word AssociationNorms, Mutual Information and Lexicography,”Computational Linguistics, Vol.16, March. 1990, pp.22-29[20] Smadja, Frank, “Retrieving Collocations from Text:Xtract,” Computational Linguistics, Vol. 19 , No. 1, 1993, pp.143-177[21] Smadja, Frank, McKeown, K.R. and Hatzivasi-loglou, V. “Translating Collocations for Bilingual Lexicons,” Computational Linguistics, Vol. 22, No.1, 1996[22] Church, K, W. Gale, P. Hanks, and D. Hindle.1991 “Using Statistics in Lexical Analysis,” in Zernik (ed.) Lexical Acquisition: Exploiting On-Line Resources to Build a Lexicon, pp. 115-164, Lawrence Erlbaum Associates Publishers。