Developing a test collection for biomedical word sense disambiguation
生物仿制药研发流程

生物仿制药研发流程英文回答:Biosimilar Development Process.The biosimilar development process can be divided into several major steps:1. Target selection: The first step in biosimilar development is to select a target molecule. This is typically a biologic drug that has already been approvedfor marketing and is known to be safe and effective.2. Cell line development: Once a target molecule has been selected, the next step is to develop a cell line that can produce the biosimilar. This involves isolating cells from the source organism and then genetically modifying them to express the target protein.3. Protein production: The cell line is then used toproduce the biosimilar protein. This process involves growing the cells in a bioreactor and then harvesting the protein from the culture medium.4. Purification: The harvested protein is then purified to remove impurities. This process may involve a variety of techniques, such as chromatography, filtration, and precipitation.5. Characterization: The purified protein is then characterized to ensure that it is similar to the target molecule in terms of structure, function, and immunogenicity. This process may involve a variety of analytical techniques, such as SDS-PAGE, Western blotting, and ELISA.6. Clinical trials: Once the biosimilar has been characterized, it is ready to be tested in clinical trials. These trials are designed to evaluate the safety and efficacy of the biosimilar in humans.7. Regulatory approval: The final step in thebiosimilar development process is to obtain regulatory approval. This process involves submitting a marketing application to the relevant regulatory agency, such as the FDA or EMA. The application must include data from the clinical trials and other studies to support the safety and efficacy of the biosimilar.中文回答:生物仿制药研发流程。
小学上册第五次英语基本全练全测(有答案)

小学上册英语基本全练全测(有答案)英语试题一、综合题(本题有100小题,每小题1分,共100分.每小题不选、错误,均不给分)1.My dog follows me everywhere like a ______ (影子).2.What is the name of the holiday celebrated on December 25?A. ThanksgivingB. EasterC. ChristmasD. Halloween答案:C3.What do you call a person who studies the natural world?A. ScientistB. NaturalistC. BiologistD. All of the above答案: D4.The _____ (butterfly) is flying.5.I like to eat ________ for breakfast.6. (Revolution) in Russia led to the rise of the Soviet Union. The ____7.My neighbor has a big _______ (我邻居有一只大_______).8. A mineral’s hardness is measured on the ______ scale.9.What do we call a community of living organisms?A. PopulationB. EcosystemC. HabitatD. Biome答案: B10.The __________ was a major period of cultural revival in Europe.11.My favorite book is _______ (傲慢与偏见)。
12.The __________ is a mountain chain in Europe.13.What do you call the process of changing from a caterpillar into a butterfly?A. MetamorphosisB. TransformationC. EvolutionD. Development答案: A14.The ______ is a type of animal that can camouflage.15.The library is _______ (很安静的).16.Which animal lives in the Arctic?A. LionB. Polar BearC. KangarooD. Elephant答案: B17. A solution with a pH less than is considered _______.18.The ______ (植物的医用特性) are explored in research.19.In _____ (意大利), you can explore ancient ruins.20. A _______ reaction is one that absorbs heat. (吸热)21.Which season is the coldest?A. SpringB. SummerC. FallD. Winter答案:D22.They are playing ________ (游戏) in the park.23.The wildebeest migrates across the ______ (草原).24.The country known for the Eiffel Tower is ________ (法国).25.听录音,按听到的顺序给下列图画标上正确的序号。
包头2024年10版小学4年级上册第九次英语第2单元测验试卷

包头2024年10版小学4年级上册英语第2单元测验试卷考试时间:100分钟(总分:140)B卷一、综合题(共计100题共100分)1. 选择题:What is the name of the large mammal with tusks?A. HippoB. ElephantC. RhinoD. Giraffe2. 填空题:The __________ (历史的连接) builds relationships.3. 选择题:What is the name of the person who studies the stars and planets?A. GeologistB. AstronomerC. BiologistD. Chemist4. 填空题:A playful ___ (小猴子) jumps around in the trees.5. 填空题:The _______ (The French Revolution) overthrew the monarchy in France.6. 选择题:Which animal is known for its ability to fly?A. ElephantB. TigerC. BirdD. SharkI like to ______ with my cousins during the holidays. (play)8. 填空题:The ________ (城市生活) can be hectic.9. 听力题:The process of plants making their food using sunlight is called __________.10. 听力题:Plants need _______ to produce flowers.11. 听力填空题:I love drawing pictures. It’s a way for me to express my thoughts and feelings. My favorite thing to draw is __________ because it allows my creativity to flow.12. 听力题:The capital of Mozambique is __________.13. 听力题:The ancient Romans used _______ for their buildings.14. 填空题:I call my mother's sister __________. (姑姑)15. 听力题:My friend is a ______. He enjoys building models.16. 听力题:I eat _____ (breakfast) in the morning.17. 填空题:The ancient Greeks held the first _____.18. 填空题:My dad takes care of _______ (东西). 他总是很 _______ (形容词).19. 填空题:A ______ (社区花园计划) can promote health.20. 选择题:What is a synonym for "fast"?A. SlowB. QuickC. LazyD. TiredThe ____ is a tiny creature that loves to collect food.22. 听力题:The badger digs a ______.23. 听力题:The dog is _____ (barking/sleeping).24. 填空题:My cousin has a _________ (玩具恐龙) that roars loudly.25. 听力题:The ice cream is ___ (melting) in the heat.26. 听力题:The ________ (cactus) grows in the desert.27. 听力题:Astronomical units (AU) are used to measure distances within the ______.28. 填空题:I want to _______ (学习) about the solar system.29. 填空题:A ____(biodiversity conservation) strategy protects ecosystems.30. 选择题:What is the capital of the USA?A. New YorkB. Washington D.C.C. Los AngelesD. Chicago31. 选择题:What do we call the science of studying the universe?A. GeologyB. AstronomyC. MeteorologyD. Physics答案:B32. 填空题:Each plant has a specific _____ (需求).33. sustainable development goals) aim for global improvement. 填空题:The ____34. 填空题:My dog has a very loud _________ (吠声).35. 填空题:Bunnies like to eat ________________ (胡萝卜).36. 听力题:A snail carries its ______ on its back.37. 选择题:What do we call the area of land that is home to a specific community of living organisms?A. EcosystemB. HabitatC. BiomeD. Niche答案: B. Habitat38. 听力题:The _____ (car/bike) is fast.39. 听力题:A ______ is a large, slow-moving river of ice.40. 听力题:The process of ______ occurs when the Earth's plates shift.41. 听力题:The chemical formula for sodium carbonate is ______.42. y of Paris ended the American ________ (独立战争). 填空题:The Trea43. 选择题:What is the primary function of the lungs?A. To pump bloodB. To digest foodC. To breatheD. To filter waste答案:C44. 听力题:The _______ is a great way to learn about nature.A bat can navigate in the ________________ (黑暗).46. 选择题:What is the name of the famous American singer known for "Born to Run"?A. Bruce SpringsteenB. Bob DylanC. Johnny CashD. Jim Morrison答案:A47. 选择题:What do we call an animal that eats both plants and meat?A. HerbivoreB. CarnivoreC. OmnivoreD. Insectivore答案:C48. 填空题:My dad is a great __________ (榜样) for us.49. 选择题:What do we call a baby dog?A. KittenB. PuppyC. CubD. Calf答案: B50. rain shadow) is an area receiving less rain. 填空题:The ____51. 选择题:What do we call the process of changing from a caterpillar to a butterfly?A. TransformationB. EvolutionC. GrowthD. Metamorphosis答案: D52. ssance artist _____ painted the Mona Lisa. 填空题:The RenaThis ________ (玩具) is great for developing skills.54. 选择题:What is the name of the famous clock tower in London?A. Big BenB. Eiffel TowerC. Leaning Tower of PisaD. Statue of Liberty答案:A55. 听力题:The main source of energy for the Earth is the ______.56. 听力题:The flowers are ______ in the garden. (blooming)57. 填空题:The _____ (枝条) can be pruned for better growth.58. 听力题:The dog is _____ (chasing/playing) its tail.59. 填空题:I love to ______ (与朋友一起) study.60. 选择题:What is the capital city of Vietnam?A. HanoiB. Ho Chi Minh CityC. Da NangD. Nha Trang61. 选择题:What is the capital of Argentina?A. SantiagoB. Buenos AiresC. LimaD. Montevideo答案:B62. 选择题:Which ocean is the largest?A. AtlanticB. IndianC. Arctic答案: D63. 听力题:A __________ is used to represent the number of atoms in a molecule.64. 听力题:They _____ (play/plays) soccer in the park.65. 听力题:Mars has ______, which are dried up riverbeds.66. 填空题:The _______ (Machu Picchu) is an ancient Incan city located in Peru.67. 选择题:What is the currency used in the United States?A. DollarB. EuroC. PoundD. Yen68. 选择题:What do you call a young female deer?A. FawnB. BuckC. DoeD. Calf69. 听力题:The dog is _____ (wagging/sitting) its tail.70. 填空题:My cousin's favorite sport is _______ (运动). 她每周都 _______ (动词).71. 填空题:I enjoy _______ (进行) experiments in science.72. 填空题:The ancient Egyptians believed in a pantheon of _____.73. 选择题:What do we call a person who writes stories?A. WriterB. ArtistD. Actor74. 填空题:The _____ (acorn) is the seed of an oak tree.75. 听力题:The soup tastes ___. (delicious)76. 填空题:Many flowers bloom in _____ (春天) and attract bees.77. 选择题:What is the name of Saturn's largest moon?A. TitanB. EnceladusC. RheaD. Iapetus78. 填空题:A _______ (金鱼) swims gracefully.79. 选择题:What do you call a group of lions?A. PackB. FlockC. PrideD. School答案:C80. 选择题:What is the main ingredient in sushi?A. NoodlesB. RiceC. BreadD. Potatoes答案: B81. 听力题:The library has many ______ (books).82. ssance was a revival of _____ and learning. 填空题:The Rena83. 选择题:What do you call the sweet food made from sugar and butter?B. FudgeC. ToffeeD. Candy答案: B84. 听力题:A ________ is a place where people live and work.85. 填空题:A fluffy ___ (小羊) bleats softly.86. 填空题:My favorite sport is ______ (羽毛球).87. 填空题:This ________ (玩具) helps me think critically.88. 听力填空题:In the summer, I love to go __________. The weather is usually __________, which makes it perfect for __________. I often take my __________ along because he/she loves to __________. We usually pack __________ and spend the whole day __________.89. 听力题:The main gas used in balloons is __________.90. 填空题:The bird sings a beautiful ______ (歌).91. 听力题:I need to ___ (wash/clean) my hands.92. 选择题:What is the name of the famous clock tower in Italy?A. Big BenB. Leaning Tower of PisaC. Campanile di San MarcoD. Colosseum答案: C93. 填空题:The ________ (日落) over the ocean is breathtaking.94. 选择题:What do you call a story about someone's life?A. NovelC. FictionD. Poem答案:B95. 选择题:What is 7 x 3?A. 21B. 24C. 18D. 2096. 听力题:The man has a funny ________.97. 选择题:What do we call the process of growing plants?A. GardeningB. FarmingC. AgricultureD. All of the above98. 听力题:A reaction that produces light is called a ______ reaction.99. 填空题:My __________ (玩具名) is very __________ (形容词) and fun.100. 填空题:The ant makes a ______ (窝) underground.。
专题05 阅读理解D篇(2024年新课标I卷) (专家评价+三年真题+满分策略+多维变式) 原卷版

《2024年高考英语新课标卷真题深度解析与考后提升》专题05阅读理解D篇(新课标I卷)原卷版(专家评价+全文翻译+三年真题+词汇变式+满分策略+话题变式)目录一、原题呈现P2二、答案解析P3三、专家评价P3四、全文翻译P3五、词汇变式P4(一)考纲词汇词形转换P4(二)考纲词汇识词知意P4(三)高频短语积少成多P5(四)阅读理解单句填空变式P5(五)长难句分析P6六、三年真题P7(一)2023年新课标I卷阅读理解D篇P7(二)2022年新课标I卷阅读理解D篇P8(三)2021年新课标I卷阅读理解D篇P9七、满分策略(阅读理解说明文)P10八、阅读理解变式P12 变式一:生物多样性研究、发现、进展6篇P12变式二:阅读理解D篇35题变式(科普研究建议类)6篇P20一原题呈现阅读理解D篇关键词: 说明文;人与社会;社会科学研究方法研究;生物多样性; 科学探究精神;科学素养In the race to document the species on Earth before they go extinct, researchers and citizen scientists have collected billions of records. Today, most records of biodiversity are often in the form of photos, videos, and other digital records. Though they are useful for detecting shifts in the number and variety of species in an area, a new Stanford study has found that this type of record is not perfect.“With the rise of technology it is easy for people to make observation s of different species with the aid of a mobile application,” said Barnabas Daru, who is lead author of the study and assistant professor of biology in the Stanford School of Humanities and Sciences. “These observations now outnumber the primary data that comes from physical specimens(标本), and since we are increasingly using observational data to investigate how species are responding to global change, I wanted to know: Are they usable?”Using a global dataset of 1.9 billion records of plants, insects, birds, and animals, Daru and his team tested how well these data represent actual global biodiversity patterns.“We were particularly interested in exploring the aspects of sampling that tend to bias (使有偏差) data, like the greater likelihood of a citizen scientist to take a picture of a flowering plant instead of the grass right next to it,” said Daru.Their study revealed that the large number of observation-only records did not lead to better global coverage. Moreover, these data are biased and favor certain regions, time periods, and species. This makes sense because the people who get observational biodiversity data on mobile devices are often citizen scientists recording their encounters with species in areas nearby. These data are also biased toward certain species with attractive or eye-catching features.What can we do with the imperfect datasets of biodiversity?“Quite a lot,” Daru explained. “Biodiversity apps can use our study results to inform users of oversampled areas and lead them to places – and even species – that are not w ell-sampled. To improve the quality of observational data, biodiversity apps can also encourage users to have an expert confirm the identification of their uploaded image.”32. What do we know about the records of species collected now?A. They are becoming outdated.B. They are mostly in electronic form.C. They are limited in number.D. They are used for public exhibition.33. What does Daru’s study focus on?A. Threatened species.B. Physical specimens.C. Observational data.D. Mobile applications.34. What has led to the biases according to the study?A. Mistakes in data analysis.B. Poor quality of uploaded pictures.C. Improper way of sampling.D. Unreliable data collection devices.35. What is Daru’s suggestion for biodiversity apps?A. Review data from certain areas.B. Hire experts to check the records.C. Confirm the identity of the users.D. Give guidance to citizen scientists.二答案解析三专家评价考查关键能力,促进思维品质发展2024年高考英语全国卷继续加强内容和形式创新,优化试题设问角度和方式,增强试题的开放性和灵活性,引导学生进行独立思考和判断,培养逻辑思维能力、批判思维能力和创新思维能力。
生物样品分析英语作文

生物样品分析英语作文标题,Analysis of Biological Samples。
In modern scientific research and medical diagnostics, the analysis of biological samples plays a crucial role in understanding diseases, developing new treatments, and monitoring health conditions. Through various analytical techniques, scientists can extract valuable informationfrom biological samples such as blood, urine, saliva, and tissue, contributing to advancements in biotechnology and healthcare.One of the most common biological samples used for analysis is blood. Blood contains a wealth of information about an individual's health, including levels of various biomarkers, such as glucose, cholesterol, and hormones. Analyzing blood samples can help diagnose diseases such as diabetes, cardiovascular disorders, and hormonal imbalances. Techniques such as enzyme-linked immunosorbent assay (ELISA), polymerase chain reaction (PCR), and massspectrometry are often employed to detect and quantify specific molecules in blood samples with high sensitivity and accuracy.Urine is another valuable biological sample for analysis. It provides insights into kidney function, hydration levels, and metabolic processes in the body. Urinalysis, which involves examining the physical, chemical, and microscopic properties of urine, can help diagnose urinary tract infections, kidney diseases, and metabolic disorders. Additionally, drug testing often relies on urine samples to detect the presence of illicit substances or medications.Saliva has emerged as a non-invasive alternative to blood and urine for certain types of analysis. Salivary biomarkers can reflect physiological changes associatedwith stress, hormonal fluctuations, and oral health. Researchers are exploring the use of saliva testing for diagnosing conditions such as periodontal disease, diabetes, and even certain types of cancer. Advances in saliva collection devices and analytical techniques have madesaliva analysis more accessible and reliable.Tissue samples, obtained through procedures such as biopsies, are indispensable for understanding the molecular mechanisms underlying diseases such as cancer. Histological analysis of tissue samples allows pathologists to examine cellular structures and identify abnormalities indicative of disease. In addition to traditional histopathology, molecular techniques such as fluorescence in situ hybridization (FISH), immunohistochemistry (IHC), and next-generation sequencing (NGS) enable deeper molecularprofiling of tissues, guiding personalized treatment strategies for patients.The analysis of biological samples is not without its challenges. Sample collection, storage, and processing must be carefully standardized to ensure reproducibility and reliability of results. Contamination, degradation, and variability inherent to biological samples can introduce errors and confound interpretations. Moreover, ethical considerations surrounding the use of human samples, including informed consent and patient privacy, must berigorously addressed in research and clinical settings.Despite these challenges, the analysis of biological samples continues to drive innovation in medicine and biotechnology. Emerging technologies such as microfluidics, biosensors, and artificial intelligence promise to revolutionize how biological samples are analyzed, making diagnostics faster, more accurate, and more accessible. Collaborative efforts among scientists, clinicians, and industry partners are essential to harnessing the full potential of biological sample analysis for improving human health.In conclusion, the analysis of biological samples is a cornerstone of modern biomedical research and clinical practice. By harnessing the wealth of information contained within blood, urine, saliva, and tissue samples, scientists and healthcare professionals can gain insights into disease mechanisms, develop targeted therapies, and monitor treatment responses. Continued advancements in analytical techniques and technology will further enhance our abilityto leverage biological sample analysis for the benefit of individuals and society as a whole.。
生物食品添加剂检测流程

生物食品添加剂检测流程英文回答:The process of testing for additives in organic food involves several steps to ensure the safety and quality of the products. Here is a general outline of the testing process:1. Sample collection: Samples of the organic food products are collected from different sources, such as farms, markets, or processing plants. The samples should be representative of the entire batch or lot of the product.2. Sample preparation: The collected samples are then prepared for analysis. This may involve grinding or homogenizing the samples to obtain a uniform and representative sample for testing.3. Analytical method selection: The appropriate analytical method is selected based on the specificadditive being tested for. Different additives may require different testing methods, such as chromatography, spectrometry, or immunoassays.4. Extraction: The additives are extracted from the sample using solvents or other suitable extraction techniques. This step aims to separate the additives from the food matrix for accurate analysis.5. Instrumental analysis: The extracted additives are then analyzed using advanced instrumentation, such as gas chromatography, liquid chromatography, mass spectrometry, or nuclear magnetic resonance spectroscopy. These techniques can identify and quantify the additives present in the sample.6. Data analysis: The data obtained from the instrumental analysis is analyzed to determine the concentration of the additives in the organic food product. This data is compared to regulatory limits or guidelines to assess compliance with food safety standards.7. Reporting: The test results are compiled into a report, which includes information on the tested additives, their concentrations, and compliance with regulations. This report is typically provided to the relevant authorities, food manufacturers, or consumers.中文回答:有机食品添加剂检测的流程包括以下几个步骤,以确保产品的安全和质量。
生物医药英语作文模板

生物医药英语作文模板英文回答:Biopharmaceutical Industry A Comprehensive Overview。
Introduction。
The biopharmaceutical industry is a rapidly growing and dynamic sector of the healthcare industry. It encompasses the development, manufacture, and marketing of therapeutic products derived from living organisms or their components. This industry has revolutionized modern medicine, offering groundbreaking treatments for a wide range of diseases and conditions.Key Characteristics。
Technology-driven: The biopharmaceutical industry heavily relies on advanced technologies, including biotechnology, genomics, and gene therapy, to develop andproduce innovative therapies.Research-intensive: Research and development (R&D) is the cornerstone of the industry, with significant investments made in clinical trials and preclinical studies.Global reach: Biopharmaceutical companies operate worldwide, conducting research, clinical trials, and manufacturing in various countries to meet the global demand for innovative treatments.Regulation: The industry is subject to strict regulations to ensure the safety and efficacy of biopharmaceutical products, with regulatory bodies such as the FDA and EMA playing a critical role.Product Categories。
developmental biology 英文原版书

developmental biology 英文原版书Developmental biology is the study of the process by which organisms grow and develop. It examines how cells differentiate into specialized cell types, how tissues form organs, and how organisms develop from a single fertilized egg into complex multicellular organisms. Understanding developmental biology is crucial for various fields, including medicine, agriculture, and evolutionary biology.One of the foundational textbooks in the field of developmental biology is "Developmental Biology" by Scott F. Gilbert. First published in 1985, the textbook has since become a staple in undergraduate and graduate courses in developmental biology. The book covers a wide range of topics, including embryonic development, morphogenesis, and evolutionary developmental biology (or "evo-devo").One of the key strengths of Gilbert's "Developmental Biology" is its comprehensive coverage of the field. The textbook provides a detailed exploration of the molecular, cellular, and genetic mechanisms that underlie development. It also delves into the environmental factors that influence development, such as nutrition, stress, and epigenetics. By integrating thesedifferent levels of analysis, the book offers a holistic understanding of the complex processes of development.Another feature of "Developmental Biology" is its emphasis on experimental approaches. The textbook includes numerous examples of classic experiments that have shaped our understanding of developmental biology. From early studies on embryonic induction to more recent research on stem cells and tissue regeneration, Gilbert highlights the importance of experimental evidence in advancing our knowledge of development.In addition to its thorough coverage of developmental biology concepts, Gilbert's textbook also explores the broader implications of developmental biology for other disciplines. The book discusses how insights from developmental biology can inform medical research, conservation efforts, and our understanding of evolution. By highlighting the interconnectedness of different fields, Gilbert underscores the relevance of developmental biology across diverse areas of study.Overall, "Developmental Biology" by Scott F. Gilbert is a valuable resource for anyone interested in understanding the intricate processes of organismal development. Whether you area student, a researcher, or a science enthusiast, this textbook offers a comprehensive overview of the field of developmental biology. Its clear writing style, engaging illustrations, and deep insights make it a must-read for anyone curious about how organisms grow and evolve.。
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
Developing a Test Collection for Biomedical Word Sense Disambiguation Marc Weeber,PhD,James G.Mork,MSc,Alan R.Aronson,PhD{weeber,mork,alan}@Lister Hill National Center for Biomedical CommunicationsNational Library of Medicine8600Rockville PikeBethesda,MD20894Ambiguity,the phenomenon that a word has more than one sense poses difficulties for many current Natural Language Processing(NLP)systems.Algorithms that assist in the resolution of these ambiguities,i.e.disam-biguate a word,or more generally,a text string,will boost performance of these systems.To test such tech-niques in the biomedical language domain,there is the need for a test collection of disambiguated ambigu-ous strings.We report on the development of a Word Sense Disambiguation(WSD)test collection that com-prizes5,000disambiguated instances for50ambigu-ous UMLS®Metathesaurus®strings.INTRODUCTIONConsider the following sentences that include the word cold taken from three different MEDLINE®ab-stracts:1(1)A greater proportion of mesophil micro-organisms were to be found during the coldmonths than in warmer months.(2)In a controlled randomised trial we analysedwhether the use of the term“smoker’s lung”instead of chronic bronchitis when talking topatients with chronic obstructive lung disease(COLD)changed their smoking habits.(3)The overall infection rate was83%and of thoseinfected,88%felt that they had a cold.The sense of the word cold is different in each sen-tence.Cold in sentence(1)is an indication of the tem-perature,in sentence(2)the acronym of chronic ob-structive lung disease and in sentence(3)cold is a dis-ease.The fact that a single word may have more than one sense is called ambiguity.In natural language,am-biguity occurs at many levels,e.g.,lexical,structural, semantic,and pragmatic.Also,it pervades normal lan-guage use;humans have to disambiguate constantly (and subconsciously)in normal communication using textual and other types of context.The general opinion is that language in more restricted environments,such as medical research,is more spe-cific and straightforward;there is less ambiguity.This may well be the case,but ambiguity is still present as shown by the examples above.Additionally,the UMLS®Metathesaurus®[2],the largest medical the-saurus,has more than7,400ambiguous strings that map to more than one thesaurus concept[3].The word cold,for instance,maps to six different UMLS con-cepts,three of which we used in sentences(1)–(3).MEDICAL NLP AND AMBIGUITY Medical NLP systems,generally designed to analyze medical texts for decision support or indexing pur-poses,have to deal with ambiguities in language. Columbia University’s MedLEE system,originally de-signed for a small medical(and language)domain has been applied to differentfields within medicine. One of the problems encountered when broadening the scope of such a system is the introduction of ambigu-ities.A term or word has different senses in different medical disciplines.MedLEE has some ad-hoc rules to deal with ambiguities,but there is a need for new, machine learning(ML)techniques and a good collec-tion of training data[4].The objective of the National Library of Medicine (NLM)’s Indexing Initiative is to investigate NLP methods whereby automated indexing techniques can partially or completely substitute for current(manual) indexing practices[5].Error analysis of the index-ing system shows that the major problems concern ambiguity of strings.Also,MetaMap,a text to con-cept mapping program[6,7]is currently unable to disambiguate ambiguous concepts.The DAD-system, a concept-based tool for literature-based discovery in biomedicine[8,9]uses MetaMap for the processing of MEDLINE texts.In replicating Swanson’s literature-based discovery of the involvement of magnesium de-ficiency in migraine[10],the DAD-system showed that the abbreviation mg might be interesting for treating migraine.However,the DAD-system is not able to distinguish between the UMLS concepts Magnesium1The PubMed[1]ID’s are9477717,9411973,and9578931respectively.and Milligram for mg.This means that spurious infor-mation on milligram is included in the system’s out-put[9].In their recent study on UMLS concept index-ing,Nadkarni et al.think a fully automatic procedure is not yet feasible,in part because of ambiguity prob-lems[11].Though there is clearly a need,the only research on biomedical word sense disambiguation are[12]and[4].These two studies use rule-based approaches fora few cases in small domains.Recently,WSD has seen an upsurge of interest in computational linguis-tics,illustrated by a1998special issue of Computa-tional Linguistics,V ol24(1)and a2000special issue of Computer and the Humanities,V ol.34(1/2).Ad-ditionally,there are the SENSEV AL workshops.2The time is ripe to test the newly developed algorithms in the biomedical language domain.Essential for test-ing the algorithms is a collection of manually disam-biguated biomedical text strings for use as a gold stan-dard.This paper reports on the development of such a WSD test collection.EXTENT OF AMBIGUITY IN MEDLINE To appreciate the amount of ambiguity present in MEDLINE,we processed the409,337citations added to the citation database in1998.The processing con-sisted offinding UMLS concepts in the titles and ab-stracts of these citations by means of the MetaMap program.MetaMap chunks the sentences into(mostly noun)phrases that are mapped to UMLS concepts.In this experiment,we use the1999version of the UMLS. Table1displays some basic statistics.Table1:Mapping Results for1998MEDLINE.No.of citations409,337No.of non-ambiguous phrases30,514,468No.of ambiguous phrases4,051,445We observe that11.7%of the more than34million phrases result in more than one mapping to UMLS concepts,i.e.there is an ambiguous mapping.The dif-ferences between concepts are best depicted by the dif-ferent semantic types that have been assigned to them. Studying the data,we observed three types of ambigu-ities:a)simple ambiguities in which a string maps to more than one UMLS concept(94.3%of all cases),b) lexical ambiguities(5.5%),and c)complex ambigui-ties(0.2%).See Table2for examples.Table2:Three Types of Ambiguities.Type UMLS concept Semantic typeSimple:activityActivity1FindingActivity2Daily or recr.activity%activity Quantitative concept Lexical:reportedReporting Health care activityReports Intellectual productReport2Intellectual product Complex:reproductive health policiesReproduction Organism function+Health Idea or concept+Policies Regulatory activityReproductive Health Occupation or discipline+Policies Regulatory activityReproduction Organism function+Health Policies Regulation or lawMETHODSBecause complex ambiguities are both difficult and rare,and because lexical ambiguities should be re-solved by better parsing strategies,we focus on simple ambiguities in the remainder of this paper.To disam-biguate the strings we use human raters.Selection of StringsBased on the list of ambiguous UMLS strings,we have selected50highly frequent ones for inclusion into the test collection.They are tabulated in Table3.Some highly frequent strings were not included because the concepts they are mapped to were either difficult to distinguish or the UMLS did not provide informative and consistent definitions and(hierarchical)relation-ships.The second and seventh columns provide the strings’frequency of occurrence in the1998MEDLINE ci-tations.Columns three and eight provide the num-ber of different senses,or UMLS concepts to which a string maps.For some cases,we do not use all con-cepts available in the UMLS because we judged some of them to be too close in sense to make a practical distinction.Columns4and9tabulate the number of concepts we discarded for each string.For instance, MetaMap maps the string depression to three differ-ent UMLS concepts:Depression motion,Depressive episode,unspecified,and Mental Depression.The lat-ter two concepts are very close in sense,so we decided to use only the second of the two,Mental depression,2See /senseval2/for more information.Table3:Ambiguous Strings in the NLM’s WSD Test Collection.The italicized ones are problematic to obtain a good agreement between raters.Excl R=rater excluded,and Excl S=number of senses excluded.String Occurrences Senses Excl S Excl R String Occurrences Senses Excl S Excl R adjustment2,59642lead9,8803association18,5313man5,2434blood pressure6,71341mole3,64241cold2,4486mosaic5695condition24,8913nutrition3,45641culture20,63531pathology4,37331degree17,4193pressure9,11841depression7,57731radiation5,8223determination36,7793reduction22,9793discharge5,07231repair6,77131energy7,32731resistance13,1323evaluation19,31931scale6,7344extraction10,8313secretion13,27631failure7,9893sensitivity16,1734fat6,1123sex7,2144fit3,5913single29,3113fluid5,9913strains15,8733frequency16,24431support20,2283ganglion5803surgery22,53931glucose11,2053transient7,0533growth20,7123transport10,0183 immunosuppression1,5963ultrasound5,70431 implantation4,1703variation10,4313inhibition24,1213weight12,8573japanese2,9243white4,38431since the UMLS vocabularies define this concept more clearly.For each string,we have added the sense“none”which the raters can select when none of the available senses suit a particular instance.Following the depression ex-ample,there are two UMLS senses plus the“none”op-tion which leads to an ambiguity of degree three(Ta-ble3,columns3and8).The discussion on which strings to use for the test col-lection and which senses to include for each string took place in a team of11,the authors plus eight other re-searchers at the NLM with various backgrounds in li-brary sciences,linguistics,medical informatics,and medicine.The members of this group also served as raters who disambiguated the instances.For every one of the50strings,we selected100in-stances at random from the1998MEDLINE collec-tion.Almost all of these instances originate from dif-ferent citations.Thus,there were5,000instances to be disambiguated.Disambiguation ProcedureSince disambiguating5,000instances of ambiguity manually is a non-trivial task,we developed a web-based interface that facilitates the disambiguation pro-cedure and reduces the actual manual task to two mouse clicks for each instance,see Figure1for a screenshot.The left panel of the interface presents the to be disam-biguated string in red.The sentence in which it occurs, the direct context,appears in a blue box.Additionally, the rest of the title and abstract of the MEDLINE cita-tion is visible.The raters were permitted to address the strings in any order and were not required to complete a string before starting another.The order in which the100instances for every string were presented had been randomized for every user.The different con-cepts(senses)are available in the right panel.The rater can only select one concept(radio button)or pass the instance to reconsider it at a later moment in time. Concepts and their semantic types have hyperlinks to the UMLS.Analysis of RatingsTo reach afinal classification on the correct sense, there are two approaches.Thefirst one is majority vot-ing.The sense that is selected by most raters will be thefinal and correct sense.The second method is latent class analysis(LCA)[13,14].This statistical method tries tofind the underlying and“true”classifications.Figure1:Disambiguation User Interface.The left panel shows the MEDLINE citation as context to the raters to disambiguate the string cold.The possible senses(concepts),with hyperlinks to the UMLS,are in the right panel.This method may especially be useful when majority voting results in a tie.For any particular instance,LCA uses the rating patterns of the other instances to decide which is the true andfinal classification.In addition to these methods,it may be interesting tofind out to what extent raters agree and disagree with each other using the kappa()statistic[15].The determination of thefinal classification is a four-step process.We repeated this process for all50 strings.During step one,we compute the statistic for each rater–rater combination.This statistic shows which raters agree with each other,and more impor-tantly,which raters disagree systematically from all others.We use the latter information in step two.In step two,we count the total ratings for each instance of the string.If there is a majority of two votes for a certain sense,this will be thefinal classification.In case of ties,or many majorities of one,it may be inter-esting to exclude a rater if this rater disagrees system-atically with all the others.We apply step three if step two does not result in sat-isfactory results for many instances of the string,i.e. there are many ties and majorities of one and exclud-ing one(or more)raters does not improve results.For these cases,we use LCA to obtain a classification. Step four is the reassessment of instances in a group discussion of the disambiguation team.These in-stances did not obtain a reliable classification by step 2or step3.RESULTSDepending on the difficulty of the case,raters spent between thirty minutes and two hours per ambiguous string(100instances).The rating task was done in addition to the raters’normal tasks.After a period of four months,during which there were three meet-ings in which the group discussed examples of dif-ficult strings and particular instances,the data were frozen.Eight raters completed all the5,000instances, the other three completed2800(28strings),2200(22 strings),and600(6stings)respectively.The agreement analysis by the statistic provided many interesting insights.For instance,the two raters who agreed best for most of the50strings are both for-mer NLM indexers(the only two in the team).Also, for many strings,one or two of the raters disagreed systematically with the rest of the group.By exclud-ing them in eleven cases(columns5and10in Table3) we are able to resolve ties and many majorities of one. Eight raters were excluded at least once.Steps1and 2were sufficient for38strings.Only162of the3,800 instances had to be discussed in the team for afinal classification(step4).The twelve remaining strings, written in italics in Table3,were more problematic in that there are many ties and majorities of one.Afterusing LCA,still159of the1,200instances had to be discussed in the group to reach afinal classification.DISCUSSIONAt the National Library of Medicine,we have devel-oped a test collection for word sense disambiguation research.This collection will hopefully prove valuable for the future developments of medical NLP tools.As afirst step we will apply different machine learning al-gorithms to disambiguate a string based on its context. The definition of the context will be one of the major challenges.The test collection provides the PubMed ID,the sentence in which the string occurs,the syn-tactic tags of the words in the sentence and the con-cepts that are found in the sentences by MetaMap[7]. Included with the concepts are their semantic types, therefore the semantic context may be included in the feature list that can be used by the algorithms.We observe a distinction between two type of strings in the test collection:normal and problematic ones.For the problematic ones,it was difficult to obtain agree-ment among the raters on which sense is the accurate disambiguation of many of a string’s instances.When human judgment is problematic,it may be impossible to automate disambiguation reliably.We therefore rec-ommend tofirst consider the38normal strings(3,800 instances)with ML algorithms before turning to the problematic ones.By Summer2001,The WSD test collection will be available as a UMLS resource from the NLM at /.ACKNOWLEDGMENTSWe would like to express our gratitude to the members of the disambiguation team:Florence Chang,Cliff Gay,Mike Hazard,Susanne Humphrey,Tom Rind-flesch,Sonya Shooshan,Carolyn Tilley,and Sara Ty-baert.This research was supported in part by an appointment to the NLM Research Participation Program adminis-tered by the Oak Ridge Institute for Science and Edu-cation through an interagency agreement between the U.S.Department of Energy and the NLM.REFERENCES[1]National Library of Medicine.PubMed,2001./ entrez/query.fcgi?db=PubMed.[2]National Library of Medicine.Unified medi-cal language system knowledge sources,2001./.[3]Roth L,Hole WT.Managing name ambiguity inthe UMLS metathesaurus.In:Proc AMIA AnnuFall Symp2000.Philadelphia,PA:Hanley andBelfus,2000;p.1124.[4]Friedman C.A broad-coverage natural languageprocessing system.In:Proc AMIA Annu FallSymp2000.Philadelphia,PA:Hanley and Bel-fus,2000;pp.270–274.[5]Aronson AR,Bodenreider O,Chang HF,Humphrey SM,Mork JG,Nelson SJ,et al.TheNLM indexing initiative.In:Proc AMIA AnnuFall Symp2000.Philadelphia,PA:Hanley andBelfus,2000;pp.17–21.[6]Aronson AR.The effect of textual variation onconcept based information retrieval.In:ProcAMIA Annu Fall Symp1996.Philadelphia,PA:Hanley and Belfus,1996;pp.373–377.[7]Aronson AR.Effective mapping of biomedicaltext to the UMLS metathesaurus:The metamapprogram.In:Proc AMIA Annu Fall Symp2001.Philadelphia,PA:Hanley and Belfus,2001;inpress.[8]Weeber M,Klein H,Aronson AR,Mork JG,deJong-van den Berg LTW,V os R.Text-baseddiscovery in biomedicine:The architecture ofthe DAD-system.In:Proc AMIA Annu FallSymp2000.Philadelphia,PA:Hanley and Bel-fus,2000;pp.903–907.[9]Weeber M,V os R,Klein H,de Jong-van den Berging concepts in literature-based discov-ery:Simulating Swanson’s raynaud–fish oil andmigraine–magnesium discoveries.J Am Soc InfSci;52(7):548–557.[10]Swanson DR.Migraine and magnesium:Elevenneglected connections.Perspect Biol Med1988;31(4):526–557.[11]Nadkarni P,Chen R,Brandt C.UMLS conceptindexing for production databases.J Am MedInform Assoc2001;8(1):80–91.[12]Rindflesch TC,Aronson AR.Ambiguity res-olution while mapping free text to the UMLSMetathesaurus.In:Proc Annu Symp ComputAppl Med Care1994.Philadelphia,PA:Hanleyand Belfus,1994;pp.240–244.[13]Bruce RF,Wiebe JM.Recognizing subjectivity:A case study on manual tagging.Nat Lang Eng1999;5(2):187–205.[14]Wiebe JM,Bruce RF,O’Hara TP.Developmentand use of a gold-standard data set for subjec-tivity classifications.In:Proc ACL1999.Cam-bridge:The MIT Press,1999;pp.246–253. [15]Fleiss JL.Measuring nominal scale agree-ment among many raters.Psychol Bull1971;76(5):378–382.。