Belief functions

合集下载

《人工智能》测试题答案

《人工智能》测试题答案测试题——人工智能原理一、填空题1.人工智能作为一门学科，它研究的对象是______，而研究的近期目标是____________ _______;远期目标是___________________。

2.人工智能应用的主要领域有_________，_________，_________，_________，_______和__________。

3.知识表示的方法主要有_________，_________，_________，_________和________。

4.产生式系统由三个部分所组成，即___________，___________和___________。

5.用归结反演方法进行定理证明时，可采取的归结策略有___________、___________、_________、_________、_________和_________。

6.宽度优先搜索对应的数据结构是___________________;深度优先搜索是________________。

7.不确定知识处理的基本方法有__________、__________、__________和__________。

8.AI研究的主要途径有三大学派，它们是________学派、________学派和________学派。

9.专家系统的瓶颈是________________________;它来自于两个阶段，第一阶段是，第二阶段是。

10.确定因子法中函数MB是描述________________________、而函数MD是描述________________________。

11.人工智能研究的主要领域有_________、_________、_________、_________、_______和__________。

12.一阶谓词逻辑可以使用的连接词有______、_______、_______和_______。

believe的宾语从句

believe的宾语从句I. IntroductionBelieve is a verb that expresses one's conviction or trust in something or someone. It is a word that holds great power and significance in shaping our thoughts, actions, and beliefs. As human beings, we constantly encounter situations that challenge our beliefs and force us to evaluate the basis upon which we hold them. The use of the verb "believe" is often accompanied by an object or a complement in the form of a noun clause, also known as an indirect statement or a reported speech clause. This essay aims to explore the various types and functions of belief as an object complement and how it impacts our understanding of reality and influences our decision-making processes.II. Types of Belief1. Direct Belief PlacementThe most straightforward type of belief is one where the direct object of "believe" is a statement or a proposition. For example, "I believe that the Earth is round" or "She believes he will win the race." In these cases, the belief is centered around a particular fact or assertion. The object clause serves as a direct complement to the verb "believe" and contains the belief itself.2. Indirect Belief PlacementAnother type of belief is one where the belief is reported or indirectly stated. This type of construction often involves verbslike "think," "claim," or "suggest." For instance, "He believes that she is a skilled pianist" or "They claim that they have found a cure for cancer." In these instances, the beliefs are not expressed directly by the speaker but rather reported through an indirect statement. The object clause indicates the content of the belief and may or may not align with the speaker's personal belief.III. Functions of the Belief Object Clause1. Assertion and ConfirmationThe primary function of the belief object clause is to assert or confirm a particular belief. It allows us to convey our thoughts, theories, or opinions about a specific subject matter. For instance, "I believe that recycling is essential for the environment" or "We think that technology is transforming the way we live." By using the object clause, we provide evidence or justification for our beliefs, making our assertions more convincing and credible.2. Uncertainty and DoubtThe belief object clause can also be used to express uncertainty or doubt. In these cases, the speaker may use phrases such as "I'm not sure," "I doubt," or "I can't believe." For example, "I can't believe that he cheated on his exams" or "She doubts that he will keep his promise." These statements indicate a lack of conviction or complete faith in the belief being expressed.3. Persuasion and InfluenceAnother significant function of the belief object clause is persuasion and influence. By stating our beliefs, we attempt to convince or persuade others to adopt the same viewpoint. For instance, "I believe that everyone should have access to quality education" or "They claim that their product is the best in the market." These statements aim to influence the thoughts and actions of others, leading them to align their beliefs with our own.4. Emotional Expression and EmpathyBeliefs are not solely based on logic or factual evidence but are often rooted in personal experiences and emotions. The belief object clause allows us to express our emotional connection to a particular belief and empathize with others who share the same sentiment. For example, "I believe that love conquers all" or "We feel that everyone deserves to be treated with respect." These statements convey our emotional connection and understanding of people's experiences, creating a sense of unity and connection.IV. The Impact of Belief1. Perception of RealityBeliefs play a crucial role in shaping our perception of reality. They act as filters through which we interpret and make sense of the world around us. Our beliefs influence how we perceive events, situations, and interactions. For example, if we believe that the world is a hostile place, we may interpret neutral situations as threatening. Conversely, if we believe in the inherent goodness of people, we may view their actions in a positive light. Beliefs shapeour reality by determining the significance and meaning we assign to different events and experiences.2. Decision-Making ProcessBeliefs also impact our decision-making processes. They serve as guiding principles that inform our choices and actions. Our beliefs influence the goals we set for ourselves, the risks we are willing to take, and the paths we choose to follow. For instance, if we believe that hard work leads to success, we are more likely to persevere in our pursuits. On the other hand, if we hold the belief that luck determines our fate, we may be more inclined to rely on chance.3. Social Identity and CohesionBeliefs are not just individual constructs, but they also play a significant role in shaping our social identity and group cohesion. Shared beliefs create a sense of belonging and connection among individuals, leading to the formation of communities, religious groups, and cultural identities. Beliefs serve as a common ground that unites people with similar values and aspirations. They provide individuals with a framework for understanding their place in society and their interactions with others.V. ConclusionBeliefs are deeply ingrained in human nature and play a fundamental role in our cognitive, emotional, and social lives. Whether expressed directly or indirectly, belief object clauses allow us to articulate and share our convictions, doubts, andemotions with others. The functions of these clauses vary from asserting our beliefs and confirming our convictions to expressing uncertainty and influencing others. The impact of our beliefs extends beyond our individual experiences, shaping our perception of reality, guiding our decision-making processes, and fostering social cohesion. By understanding the diverse types and functions of belief object clauses, we can navigate the complexity of human beliefs and foster meaningful connections with others.。

证据推理理论(Introduction to Evidential Reasoning)

2. plausibility function pl : 2 → [0, 1] is defined as: Shafer : bel is ‘normalized’ => closed world assumption=> bel(Ω)=1, pl(Ω)=1,m(∅)
Observations in belief functions
Belief functions
Ω the frame of discernment(elements of the set Ω are called ‘worlds’) One “actual world” ω0. But which? An agent can only express the strength of his/her opinion (called belief) degree of belief that the actual world belongs to this or that subset of Ω. [0, Shafer belief function bel : 2 → [0 1] bel(A) denotes the strength of Agent’s belief that ω0∈A. bel satisfies the following inequalities:
Not supporting any strictly more specific propositions A basic belief mass given to a set A supports also that the actual world is in every subsets that contains A. The degree of belief bel(A) for A quantifies the total amount of justified specific support given to A. We say justified because we include in bel(A) only the basic belief masses given to subsets of A. m({x,y}) given to {x,y} could support x if further information indicates this.However given the available information the basic belief mass can only be given to {x,y}. We say specific because the basic belief mass m(Ø) is not included in bel(A) as it is given to the subset Ø.

2024届山东省实验中学高三下学期一模英语试题

绝密★启用并使用完毕前山东省实验中学2024届高三第一次模拟考试英语试题2024.04（本试卷共10页, 共三部分: 全卷满分120分, 考试用时100分钟）注意事项:1. 答卷前, 先将自己的姓名、准考证号填写在试卷和答题纸上。

2. 选择题的作答: 每小题选出答案后, 用2B铅笔把答题卡上对应题目的答案标号涂黑。

如需改动, 用橡皮擦干净后, 再选涂其他答案标号。

3. 非选择题的作答: 用0.5mm黑色签字笔直接答在答题卡上对应的答题区域内, 写在试卷、草稿纸和答题卡上的非答题区域均无效。

第一部分阅读理解（共两节, 满分50分）第一节（共15小题; 每小题2.5分, 满分37.5分）阅读下列短文, 从每题所给的A、B、C、D四个选项中选出最佳选项。

AIntroduction to Drama ExamsOur exams inspire and enable learners across the globe to be confident communicators. Exams are open to anyone looking to gain confidence and experience in speech, communication and performance. There are no age restrictions. As one of the UK's oldest and most respected drama schools and awarding organizations, we examine over 100,000candidates and deliver exams both online and in person in many countries across the globe.Now we are pleased to offer free, online "Introduction to Examinations" information session. Booking is now opening for events until Summer 2024.The 1.5-hour session will begin with an Introduction to Examinations, their history and the format of assessment. Work will then focus on the subjects available to take, and will end with a Q&A phase where participants will be invited to write in their questions to the host organizer.Ifyouhaveanyquestionsregardingthis,********************************.ukandwewillbehappytohelp. Looking forward to seeing you online at this event.1. What is an advantage of the drama exam?A. It is free of charge.B. It offers flexible schedules.C. It suits a wide range of people.D. It puts restrictions on nationality.2. What is required to register for the sessions?A. Payment in advance.B. Contact information.C. Education background.D. Performance experience.3. What should you do if you have a question during the online session?A. Email it to the drama school.B. Write it down before the session.C. Propose it at the beginning of the session.D. Send it to the host organizer in Q&A phase.BCafeterias have been filled with challenges—right from planning, purchasing, and preparing, to reducing waste, staying on budget, managing goods, and training staff. Through the tedious process, restaurateurs lacked a unified platform for efficient management. To bring consistency to the unorganised catering（餐饮）industry, childhood friends Arjun Subramanian and Raj Jain, who shared a passion for innovation, decided to partner in 2019 to explore opportunities in the cafeteria industry.In May 2020, they co-founded Platos, a one-stop solution for restaurants with a custom technology kit to streamline all aspects of cafeteria management. The company offers end-to-end cafeteria management, staff selection and food trials to ensure smooth operations and consistent service. "We believe startups solve real problems and Platos is our shot at making daily workplace food enjoyable again. We aim to simplify the dining experience, providing a convenient and efficient solution that benefits both restaurateurs and customers and creating a connected ecosystem, "says Subramanian, CEO and co-founder.Platos guarantees that a technology-driven cafeteria allows customers to order, pay, pick up, and provide ratings and feedback. It also offers goods and menu management to effectively perform daily operations. Additionally, its applications connect all shareholders for a smart cafeteria experience. "We help businesses that are into catering on condition that they have access to an industrial kitchen setup where they' re making food according to certain standards," Jain states.Since the beginning, Platos claims to have transformed 45 cafeterias across eight cities in the country. Currently, it has over 45,000 monthly users placing more than 200,000 orders. Despite facing challenges in launching cafeterias across major cities in the initial stages, Platos has experienced a 15% increase in its month-over-month profits.As for future plans, the startup is looking to raise $1 million from investors as strategic partners, bringing in capital, expertise, and networks. "Finding the right lead investor is the compass that points your startup toward success," Subramanian says.4. What does the underlined word "tedious" in Paragraph 1 mean?A. Time-consuming.B. Breath-taking.C. Heart-breaking.D. Energy-saving.5. What is the purpose of founding Platos?A. To connect customers with a greener ecosystem.B. To ensure food security and variety in cafeterias.C. To improve cafeteria management with technology.D. To make staff selection more efficient and enjoyable.6. What can we learn from the statistics in Paragraph 4?A. Platos has achieved its ultimate financial goal.B. Platos has gained impressive marketing progress.C. Challenges in food industry can be easily overcome.D. Tech-driven cafeterias have covered most urban areas.7. What is Subramanian's future plan for Platos?A. To reduce costs.B. To increase profits.C. To seek investment.D. To innovate technology.CWith a brain the size of a pinhead, insects possess a great sense of direction. They manage to locate themselves and move through small openings. How do they do this with their limited brain power? Understanding the inner workings of an insect's brain can help us in our search towards energy-efficient computing, physicist Elisabetta Chicca of the University of Groningen shows with her most recent result: a robot that acts like an insect.It's not easy to make use of the images that come in through your eyes when deciding what your feet or wings should do. A key aspect here is the apparent motion of things as you move. "Like when you're on a train,” Chicca explains. "The trees nearby appear to move faster than the houses far away." Insects use this information to infer how far away things are. This works well when moving in a straight line, but reality is not that simple. To keep things manageable for their limited brain power, they adjust their behaviour: they fly in a straight line, make a turn, then make another straight line.In search of the neural mechanism（神经机制）that drives insect behaviour, PhD student Thorben Schoepe developed a model of its neuronal activity and a small robot that uses this model to find the position. His model is based on one main principle: always head towards the area with the least apparent motion. He had his robot drive through a long passage consisting of two walls and the robot centred in the middle of the passage, as insects tend to do. In other virtual environments, such as a space with small openings, his model also showed similar behaviour to insects.The fact that a robot can find its position in a realistic environment is not new. Rather, the model gives insight into how insects do the job, and how they manage to do things so efficiently. In a similar way, you could make computers more efficient.In the future, Chicca hopes to apply this specific insect behaviour to a chip as well. "Instead of using a general-purpose computer with all its possibilities, you can build specific hardware; a tiny chip that does the job, keeping things much smaller and energy-efficient." She comments.8. Why is "a train" mentioned in Paragraph 2?A. To illustrate the principle of train motion.B. To highlight why human vision is limited.C. To explain how insects perceive distances.D. To compare the movement of trees and houses.9. What does Paragraph 3 mainly talk about concerning Schoepe's model?A. Its novel design.B. Its theoretical basis.C. Its possible application.D. Its working mechanism.10. What do the researchers think of the finding?A. Amusing.B. Discouraging.C. Promising.D. Contradictory.11. What will Chicca's follow-up study focus on?A. Inventing insect-like chips.B. Studying general-purpose robots.C. Creating insect-inspired computers.D. Developing energy-efficient hardware.DWith the help from an artificial language（AL）model, MIT neuroscientists have discovered what kind of sentences are most likely to fire up the brain's key language processing centers. The new study reveals that sentences that are more complex, because of either unusual grammar or unexpected meaning, generate stronger responses in these language processing centers. Sentences that are very straightforward barely engage these regions, and meaningless orders of words don't do much for them either.In this study, the researchers focused on language-processing regions found in the left hemisphere（半球）of the brain. By collecting a set of 1,000 sentences from various sources, the researchers measured the brain activity of participants using functional magnetic resonance imaging（fMRI）while they read the sentences. The same sentences were also fed into a large language model, similar to ChatGPT, to measure the model's activation patterns. Once the researchers had all of those data, they trained the model to predict how the human language network would respond to any new sentence based on how the artificial language network responded to these 1,000 sentences.The researchers then used the model to determine 500 new sentences that would drive highest brain activity and sentences that would make the brain less active, and their findings were confirmed in subsequent human participants. To understand why certain sentences generate stronger brain responses, the model examined the sentences based on 11 different language characteristics. The analysis revealed that sentences that were more surprising resulted in greater brain activity. Another linguistic（语言的）aspect that correlated with the brain's language network responses was the complexity of the sentences, which was determined by how well they followed English grammar rules and bow logically they linked with each other.The researchers now plan to see if they can extend these findings in speakers of languages other than English. They also hope to explore what type of stimuli may activate language processing regions in the brain's right hemisphere.12. What sentences make our brain work harder?A. Lengthy.B. Logical.C. Straightforward.D. Complicated.13. What is the function of the AL model in the research?A. To examine language network.B. To reduce language complexity.C. To locate language processing area.D. To identify language characteristics.14. How did the researchers carry out their study?A. By conducting interviews.B. By collecting questionnaires.C. By analyzing experiment data.D. By reviewing previous studies.15. Which of the following is a suitable title for the text?A. AL Model Stimulates Brain ActivitiesB. AL Model Speeds Up Language LearningC. AL Model Reveals the Secrets of Brain ActivationD. AL Model Enhances Brain Processing Capacity第二节（共5小题; 每小题2.5分, 满分12.5分）根据短文内容, 从短文后的选项中选出能填入空白处的最佳选项。

Dempster-Shafer Theory

Dempster-Shafer TheoryGlenn Shafer1The Dempster-Shafer theory, also known as the theory of belief functions, is a generalization of the Bayesian theory of subjective probability. Whereas the Bayesian theory requires probabilities for each question of interest, belief functions allow us to base degrees of belief for one question on probabilities for a related question. These degrees of belief may or may not have the mathematical properties of probabilities; how much they differ from probabilities will depend on how closely the two questions are related.The Dempster-Shafer theory owes its name to work by A. P. Dempster (1968) and Glenn Shafer (1976), but the kind of reasoning the theory uses can be found as far back as the seventeenth century. The theory came to the attention of AI researchers in the early 1980s, when they were trying to adapt probability theory to expert systems. Dempster-Shafer degrees of belief resemble the certainty factors in MYCIN, and this resemblance suggested that they might combine the rigor of probability theory with the flexibility of rule-based systems. Subsequent work has made clear that the management of uncertainty inherently requires more structure than is available in simple rule-based systems, but the Dempster-Shafer theory remains attractive because of its relative flexibility.The Dempster-Shafer theory is based on two ideas: the idea of obtaining degrees of belief for one question from subjective probabilities for a related question, and Dempster's rule for combining such degrees of belief when they are based on independent items of evidence.To illustrate the idea of obtaining degrees of belief for one question from subjective probabilities for another, suppose I have subjective probabilities for the reliability of my friend Betty. My probability that she is reliable is 0.9, and my probability that she is unreliable is 0.1. Suppose she tells me a limb fell on my car. This statement, which must true if she is reliable, is not necessarily false if she is unreliable. So her testimony alone justifies a 0.9 degree of belief that a limb fell on my car, but only a zero degree of belief (not a 0.1 degree of belief) that no limb fell on my car. This zero does not mean that I am sure that no limb fell on my car, as a zero probability would; it merely means that Betty's testimony gives me no reason to believe that no limb fell on my car. The 0.9 and the zero together constitute a belief function.To illustrate Dempster's rule for combining degrees of belief, suppose I also have a 0.9 subjective probability for the reliability of Sally, and suppose she too testifies, independently of Betty, that a limb fell on my car. The event that Betty is reliable is independent of the event that Sally is reliable, and we may multiply the probabilities of these events; the probability that both are reliable is 0.9x0.9 = 0.81, the probability that neither is reliable is 0.1x0.1 = 0.01, and the probability that at least one is reliable is 1 -0.01 = 0.99. Since they both said that a limb fell on my car, at least of them being reliable implies that a limb did fall on my car, and hence I may assign this event a degree of belief of 0.99.Suppose, on the other hand, that Betty and Sally contradict each other—Betty says that a limb fell on my car, and Sally says no limb fell on my car. In this case, they cannot both be right and hence cannot both be reliable—only one is reliable, or neither is reliable. The prior probabilities that only Betty is reliable, only Sally is reliable, and that neither is reliable are 0.09, 0.09, and 0.01, respectively, and the posteriorprobabilities (given that not both are reliable) are 919 , 919, and 119, respectively. Hence we have a 919degree ofbelief that a limb did fall on my car (because Betty is reliable) and a 919degree of belief that no limb fell on my car (because Sally is reliable).In summary, we obtain degrees of belief for one question (Did a limb fall on my car?) from probabilities for another question (Is the witness reliable?). Dempster's rule begins with the assumption that the questions for which we have probabilities are independent with respect to our subjective probability judgments, but this independence is only a priori; it disappears when conflict is discerned between the different items of evidence.Implementing the Dempster-Shafer theory in a specific problem generally involves solving two related problems. First, we must sort the uncertainties in the problem into a priori independent items of evidence. Second, we must carry out Dempster's rule computationally. These two problems and and their solutions are closely related. Sorting the uncertainties into independent items leads to a structure involving items of evidence that bear on different but related questions, and this structure can be used to make computations1Ronald G. Harper Distinguished Professor of Business, School of Business, Summerfield Hall, University of Kansas, Lawrence, Kansas 66045.feasible. Suppose, for example, that Betty and Sally testify independently that they heard a burglar enter my house. They might both have mistaken the noise of a dog for that of a burglar, and because of this common uncertainty, I cannot combine degrees of belief based on their evidence directly by Dempster's rule. But if I consider explicitly the possibility of a dog's presence, then I can identify three independent items of evidence: my other evidence for or against the presence of a dog, my evidence for Betty's reliability, and my evidence for Sally's reliability. I can combine these items of evidence by Dempster's rule and the computations are facilitated by the structure that relates the different questions involved.For more information, see Shafer (1990) and the articles on the belief functions in Shafer and Pearl (1990).ReferencesDempster, A.P. (1968). A generalization of Bayesian inference. Journal of the Royal Statistical Society, Series B30 205-247.Shafer, Glenn (1976). A Mathematical Theory of Evidence. Princeton University Press.Shafer, Glenn (1990). Perspectives on the theory and practice of belief functions. International Journal of Approximate Reasoning3 1-40.Shafer, Glenn, and Judea Pearl, eds. (1990). Readings in Uncertain Reasoning. Morgan Kaufmann.2。

薛定谔—麦克斯韦尔方程径向解的存在性和多重性(英文)

In 1887, the German physicist Erwin Schrödinger proposed a radial solution to the Maxwell-Schrödinger equation. This equation describes the behavior of an electron in an atom and is used to calculate its energy levels. The radial solution was found to be valid for all values of angular momentum quantum number l, which means that it can describe any type of atomic orbital.The existence and multiplicity of this radial solution has been studied extensively since then. It has been shown that there are infinitely many solutions for each value of l, with each one corresponding to a different energy level. Furthermore, these solutions can be divided into two categories: bound states and scattering states. Bound states have negative energies and correspond to electrons that are trapped within the atom; scattering states have positive energies and correspond to electrons that escape from the atom after being excited by external radiation or collisions with other particles.The existence and multiplicity of these solutions is important because they provide insight into how atoms interact with their environment through electromagnetic radiation or collisions with other particles. They also help us understand why certain elements form molecules when combined together, as well as why some elements remain stable while others decay over time due to radioactive processes such as alpha decay or beta decay.。

中国时政英语词汇

按照客观规律和科学规律办事act in compliance with objective and scientific laws八个坚持、八个反对eight do’s and eight don’ts八项主张eight-point proposal保持昂扬向上的精神状态be filled with an enterprising spirit保证中央的政令畅通ensure the Central Committee’s decisions are carried out without fail标本兼治address both the symptoms and root causes不确定因素uncertainties参政议政participation in and deliberation of state affairs长期共存、互相监督、肝胆相照、荣辱与共long-term coexistence, mutual supervision, treating each other with all sincerity and sharing weal and woe长治久安maintain prolonged stability崇尚科学respect and promote science传播先进文化spread advanced culture传统安全威胁traditional threats to security从严治军the army must be strict with itself党的领导方式the Party's style of leadership党的民族政策the Party's policy toward ethnic minorities党的侨务政策the Party's policy toward overseas Chinese affairs党的宗教信仰自由政策the Party's policy toward the freedom of religious belief党风廉政建设责任制responsibility system for improving the Party's work style and building clean government党内情况通报制度、情况反映制度和重大决策征求意见制度inner-Party information sharing and reporting systems and the system of soliciting opinions concerning major policy decisions党要管党、从严治党the Party exercises self-discipline and is strict with its members党员管理工作management of Party membership党政机关Party and government organs党政领导干部职务任期制、辞职制和用人失察失误责任追究制the system of fixed tenures, the system of resignation and the system of accountability for neglect of supervisory duty or the use of the wrong person with regard to leading cadres of the Party and government党总揽全局、协调各方的原则principle that the Party commands the overall situation and coordinates the efforts of all quarters电子政务e-government独立负责、步调一致地开展工作assume one’s responsibilities independently and make concerted efforts in one’s work独立公正地行使审判权和检察权exercise adjudicative and procuratorial powers independently and impartially多重多头执法duplicate law enforcement多重多头执法duplicate law enforcement发展民主团结、生动活泼、安定和谐的政治局面develop the political situation characterized by democracy, solidarity, liveliness, stability and harmony发展平等团结互助的社会主义民族关系enhance socialist ethnic relations of equality, solidarity and mutual assistance法定职能legal functions法律援助legal aid法制观念awareness of law防卫作战能力defense capabilities非传统安全威胁non-traditional threats to security丰富民主形式develop diverse forms of democracy干部人事制度cadre and personnel system干部双重管理体制system of dual control over cadres高知识群体prominent intellectuals公共事务public affairs公务员制度system of public servants公益事业programs for public good马列主义、毛泽东思想、邓小平理论、“三个代表”重要思想Marxism-Leninism, Mao Zedong Thought, Deng Xiao-ping Theory, Jiang Zemin “ThreeRepresent’s”important Thought新民主主义革命new-democratic revolution民族独立和人民解放national independence and the liberation of the people经济体制改革和政治体制改革reforms in the economic and political structure社会主义制度socialist system社会变革social transformation建设有中国特色的社会主义事业the cause of building socialism with Chinese characteristics中华民族的伟大复兴the great rejuvenation of the Chinese nation党在社会主义初级阶段的基本理论、基本路线、基本纲领the basic theory, line and program of our Party in the primary stage of socialism改革开放政策the policies of reform and opening to the outside中国共产党十一届三中全会The Third Plenary Session of the 11th Central Committee of the Communist Party of China马克思主义政党Marxist political Party党的第一（第二、第三）代中央领导集体the collective leadership of the Party Central Committee of the first (second/third)generation人民民主专政the p eople’s democratic dictatorship国民经济体系national economic system综合国力aggregate national strength国内生产总值the annual gross domestic product(GDP)独立自主的和平外交政策an independent foreign policy of peace马克思主义基本原理同中国具体实际相结合the fundamental principles of Marxism with the specific situation in China加强和改进党的建设，不断增强党的创造力、凝聚力和战斗力，永葆党的生机与活力strengthen and improve Party building, continuously enhance the creativity,rallying power and combat capability of the Party, and always maintain its vigor and vitality“三个代表”就是必须代表中国先进生产力的发展要求，代表中国先进文化的前进方向，代表中国最广大人民的根本利益，是我们党的立党之本、执政之基、力量之源，是我们党始终站在时代前列，保持先进性的根本体现和根本要求。

Combining Belief Fuctions When Evidence Conflicts

Ž.Decision Support Systems2920001–9 r locate r dsw Combining belief functions when evidence conflictsCatherine K.MurphyPenn State York,York,PA17403,USAAccepted6December1999AbstractThe use of belief functions to represent and to manipulate uncertainty in expert systems has been advocated by some practitioners and researchers.Others have provided examples of counter-intuitive results produced by Dempster’s rule for combining belief functions and have proposed several alternatives to this rule.This paper presents another problem,the failure to balance multiple evidence,then illustrates the proposed solutions and describes their limitations.Of the proposed methods,averaging best solves the normalization problems,but it does not offer convergence toward certainty,nor a probabilistic basis.To achieve convergence,this research suggests incorporating average belief into the combining rule. q2000Elsevier Science B.V.All rights reserved.Keywords:Belief functions;Expert systems;Decision analysis;Uncertain reasoning;Dempster–Shafer theory1.IntroductionA complicating factor in developing decision sup-port and expert systems is the handling of uncertain judgments.An expert may be unable to provide a definite answer to the question of interest.A number of basic approaches to the management of uncer-tainty in these systems have been developed.These include certainty factors,developed during the work w x w x on MYCIN3;Bayesian theory,fuzzy logic19, and belief functions,also known as Dempster–Shafer Ž.w xD-S theory5,11.Currently,there is no universally accepted method for uncertainty management.Each method has advantages and weaknesses associated w xwith it8,16.Two reasons for considering belief functions for the combination of evidence are their flexibility andŽ.E-mail address:cxm53@ C.K.Murphy.ease of use by the decision maker.Belief functionsw xfit into valuation-based systems17.Belief can be assigned to sets,not just to individual elements.In contrast with Bayesian decision modeling,belief functions do not require the expert to provide a set of prior probabilities.Also,belief in a hypothesis and its negation need not sum to1;some belief can be assigned to the base set,as a measure of uncertainty. This approach is similar to the evidence-gathering process observed when people reason at differentw x w xlevels of abstraction7.Pearl10agrees that this approach is well suited to knowledge elicitation:‘‘an expert may feel more comfortable describing the impact of an evidence in terms of weight assignment to classes rather than to individual points.’’Offsetting this ease of use are problems associated with the combination of belief bina-tion may yield conclusions different from what we expect or consider reasonable.In this paper,we0167-9236r00r$-see front matter q2000Elsevier Science B.V.All rights reserved.Ž.PII:S0167-92369900084-6()C.K.Murphy r Decision Support Systems2920001–9 2examine the combining of contradictory evidence using belief-functions.Our aim is to develop a useful alternative to the combination rule.We will compare the answers that proposed alternatives provide for dilemmas,both previously and newly identified.In the next section,we review the belief function theory and some alternative approaches for combin-ing evidence.Following that,we evaluate the perfor-mance of those alternatives in the context of a new problem,as well as a familiar one.The results of the comparison lead to the recommendation of an ap-proach for dealing with contradictory evidence.2.Belief functionsThe base set,or frame of discernment,u,for belief functions consists of a set of mutually exclu-sive and exhaustive hypotheses.The mass,m,of belief in an element of u can range from0to1, representing how strongly the evidence supports the hypothesis,without supporting a more specific hy-pothesis.This mass of belief is analogous to the weight in favor of the hypothesis.The mass,or basic probability assignment,measures the amount of be-lief attributed directly to an element;it does not include the mass attributed to any subsets of the element.In contrast,the belief function,Bel,of a set does include mass that has been specifically assigned to the element’s subsets.A subset is called a focal element of belief if its mass is greater than zero.The sum of the masses of belief in the base set,u,and its subsets is1since the base set includes the answer toŽ. the question of interest.The mass of the nullØset is defined as zero in the Dempster–Shafer frame-w xwork.In contrast,Smets14argues that,under an open-world assumption,the mass of the null set may be nonzero if the base set does not contain the correct answer.Belief can be assigned to a general conclusion, such as bacteria,as well as to singleton elements, such as pneumococcus.In addition,all belief need not be assigned to specific sets.The remaining mass, which represents uncertainty,or ignorance,is as-signed to the base set,u.This mass could possibly belong to any of the subsets.Its existence makesw possible a range of belief for every subset mass,xmass q ignorance.The range,or plausibility,of a subset extends from its mass to the mass assigned to all sets which include it.bining eÕidenceIn an expert system with IF–THEN rules,eitherŽ.the premises evidence or conclusions may be un-certain.Rules are triggered when the evidence,i.e., test results,referenced in their premises becomes available.When D-S theory is used in a rule-based expert system,applying a rule results in the assign-ment of masses of belief to the elements of the rule’sw xconclusion.Dempster’s5rule of combination han-dles the successive application of rules.Given two rules,based on independent evidence,the orthogonal sum,[,of their mass functions computes the degree of belief for the combined rules.Dempster’s rule,Ž.shown in Eq.1,varies the distributions,X and Y, from the two rules over all subsets of u and com-bines those elements where X l Y s Z.The set inter-sections represent areas where the conclusion of one rule agrees with that of the rule being combined with it.1m[m Z s m X m YŽ.Ž.Ž.Ý12121y kX l Y s Zk s m X m Y1Ž.Ž.Ž.Ý12X l Y s Bk is the mass that the combination assigned to the null subset.It represents contradictory evidence.Di-viding the other mass functions by1y k normalizes them.The process reapportions among the other subsets the belief that was originally assigned to the null set.When k s1,i.e.,when none of the combin-ing masses intersect,the function is undefined.Attractive features of the combining function are w x13:1.Concordant items of evidence reinforce eachother.2.Conflicting items of evidence erode each other.3.A chain of reasoning is weaker than its weakestlink.The first feature is accomplished by reassigning mass in the null set to the focal elements.When the mass in the null set is very large,as occurs when()C.K.Murphy r Decision Support Systems2920001–93conclusions disagree,reassigning it can cause prob-lems.Let us look at the operation of the combining function when the combining rules are in substantialÄ4 agreement.Suppose that a rule confirms A to0.4,Ä4Ä4A orB to0.2,andC to0.4.An expert systemÄ4 combines that rule with a second which confirms A to0.7and assigns the remaining mass to ignorance. The D-S combination of the two rules consists of the products of the masses from the rules’conclusions. The mass products are assigned to the intersection of the combining sets.Table1indicates the set assign-ments of the combined masses.The underlined values represent conflicting con-clusions,where the combining masses have no inter-section.The combined mass of belief in A is0.54 before normalization and0.75after normalization,inŽwhich all evidential masses were divided by1yŽ..mass of null set0.28.The mass assigned to A after normalization is greater than that assigned by either combining rule;normalization produces con-vergence toward the dominant opinion.The igno-rance interval disappeared in the combination.2.2.Problems and solutions in combining eÕidenceThe rules in Table1assigned similar plausibilitiesÄ4to the major element A.In the opposite situation, when evidential rules differ substantially in their conclusions,the combining rule of D-S theory may produce answers that disagree with the evidence. Two problems cited in the literature are the follow-ing.Ž.1D-S combination can assign100%certainty tow xa minority opinion20.Table4illustrates this prob-lem.Ž.2The‘‘ignorance’’interval disappears forever whenever a single piece of evidence imparts all its weight to a proposition and its negation,giving the false impression that precise probabilistic informa-w xtion underlies the belief10.We can see the latter effect in Table 1.The conclusion of rule2contained an interval which represented ignorance,the mass assigned to the base set.After combination with rule1,which assigned all its weight to specific sets,the resultant mass assigned to the base set u was zero.Ž.3Elements of sets with larger cardinality canw xgain a disproportionate share of belief15.Section 3.5presents this example.Because of these problems,several alternatives to the normalization process have been proposed.Ž.1Allow mass in the null set and,thus,eliminatew xthe need for normalization6.This eliminates divi-Ž.sion by1y k from Eq.1,Dempster’s rule.Ž.2Assign the mass in the null set to the base set w xu18.Since the correct destination of the conflict-ing evidence is unknown,distribute it among all the elements,rather than just the elements which happen to be intersections of the combining masses.Ž.3Average the masses assigned to a subset Z tow xŽ. determine its belief function,Bel18.Eq.2covers the case where two rules are combined.1Bel Z s m X q m Y2Ž.Ž.Ž.Ž.ÝÝ122X;Z Y;ZClarke suggests that classical estimation theory may provide the best solution in the form of the meansTable1Products of combining two mass functions using Dempster’s ruleÄ4Ä4Ä4Ä4Ä4 Conclusion A A or B C u B Rule1™m10.400.200.4000 Rule2™m2Ä4Ä4Ä4A0.7A0.28A0.140.280Ä4Ä4Ä4u0.3A0.12A or B0.06C0.120Combined mass0.540.060.1200.28Ž.Combined mass normalized0.750.080.1700 Underlined values:conflicting masses to be assigned to null set.Combined mass:sum of products whose intersection is the element in the column.Ž.Combined mass normalized:masses of focal elements after allocation of mass in null set.()C.K.Murphy r Decision Support Systems 2920001–94Table 2The silent majorityÄ4Ä4Ä4Ä4Ä4Ä4Ä4A A or B B C A or C u B First two rules 0.750.0800.17000Rule 3000.50.5000Combined mass000.040.085000.875()Combined mass normalized 000.320.68000Ž.No normalization r three rules lower bound 000.030.06000.91Ž.Mass ™Union r three rules upper bound 00.21000.280.510()Average three rules 0.3670.0670.1670.30.1and standard deviations of the beliefs.Furthermore,an awareness of the problems caused by assigning zero,rather than very small beliefs,could reduce the w x incidence of paradoxes in D-S combination 4.Ž.w x 4Oblow 9proposed that intersection and union operators constitute a lower and upper bound,respec-tively,for the mass,m ,of the combined informa-c tion.The intersection operator is Dempster’s combi-nation rule without normalization.m Z sm X m Y Lower boundŽ.Ž.Ž.Ýc 12X l Y s Z3Ž.The union operator is defined as:m Z sm X m Y Upper bound 4Ž.Ž.Ž.Ž.Ýc 12X j Y s ZŽ.The union operator of Eq.4assigns mass prod-ucts to the union of the subset of the elements being considered,rather than the intersection .The preceding four alternatives to Dempster’s rule have been proposed because combining conflicting evidence may produce counterintuitive results.In addition,several authors have noted that it may beuseful to track mass in the null set because it repre-w x sents conflicting evidence.In particular,Zeigler 21,in the context of describing a propositional evidence accumulator,suggested that the amount of contradic-tion be used to trigger a warning to the user when it exceeds a preset value.Although not designed to handle conflicting evi-dence,the iterative assignment method proposed by w x Baldwin 1,2represents another approach for com-bining belief functions.Rather than dividing all the intersection products by a single normalizing con-Ž.stant 1y k in Eq.1,the method calculates a separate normalizing constant for each column that has an intersection product corresponding to mass in the null set.That mass is set to zero and reallocated to the focal elements in the column.The result is that the focal elements in each column sum to the mass that the rule assigned to that column.For example in Table 1,the normalizing constant for the third col-umn would be 0.3;dividing the mass assigned to Ä4Ä4C ,0.12,by 0.3gives C a normalized mass of 0.4.This assignment represents only the first iteration as this method repeatedly combines the evidence until the solution converges.Table 3Effect of one strongly confirming ruleÄ4Ä4Ä4Ä4Ä4A A or B B u B Ž.Rule 1result of four rules 0.500.500Rule 50.9000.10Combined mass0.500.0500.45()Combined mass normalized 0.9100.0900Ž.Combined mass w r o normalization for five rules lower bound 0.06300.00600.931Ž.Mass ™Union for five rules upper bound 0.0560.84400.10Average0.5800.40.02Combined mass based on average0.8560.144()C.K.Murphy r Decision Support Systems2920001–95Table4Normalization alternatives for100%certainty assigned to a minority opinionÄ4Ä4Ä4Ä4Ä4Ä4Ä4Ä4A B C AB AC BC u BRule10.900.100000Rule200.90.100000 ()D-S combined mass normalized00 1.0000000Ž.D-S w r o normalization lower bound000.0100000.99Ž.Mass™Union upper bound000.010.810.090.0900 Average0.450.450.100000If the evidence is incompatible,as is the case for the examples in Tables1–4,the solution still con-verges,but it does not retain all the evidence.The iterative assignment method provides a measure of compatibility,the support pair for each focal ele-ment.The support interval for an element is brack-eted by its mass and its plausibility.When two rules are combined,the necessary support for an element is defined as its maximum mass in the combining rules.Its possible support is its minimum plausibility in the rules.For the rules in Table1,the support pair Ä4w xfor A is0.7,0.6,indicating that one expert as-Ä4signs a higher mass to A than the second expert considers plausible.When the evidence conflicts,as in this example,the iterative assignment method will lose evidence in its combination.In the next three examples,we will analyze only the four normaliza-tion alternatives that are suitable for conflicting evi-dence.In Section3.5,the cardinality problem,we will include the solution provided by the iterative assignment method.3.Problems and normalization alternativesThe set of problems identified with the combining rule implies that the following properties are ex-pected in any alternative method for combining con-flicting evidence.1.Assign the preponderance of belief to a majorityopinion,not a minority one.2.Indicate,if possible,an appropriate level of igno-rance.3.Provide a combined belief that reflects the rela-tive strengths and frequencies of the individual estimates.A further requirement is that the method satisfy commutative and associative properties so that the order and grouping of evidence do not affect the result.In considering what have been described as the limitations of the combining rule,we will distinguish Ž.between:1combinations which lack some desir-Ž.able feature and2combinations which produce errors.An example of the former is the disappear-ance of the ignorance interval,as shown in Table1. This is not a misleading result.We can see that the estimate is uncertain despite the absence of mass in the base set because the combined mass resides in two distinct subsets.One could argue that the real value of the ignorance interval is its simplification of knowledge acquisition,and that the presence of an ignorance interval in the result is of secondary im-portance.In this paper,we will examine solutions for those cases which represent errors in the combina-tion of evidence.Of the proposed alternatives,we eliminate the one that assigns mass in the null set to the base set.It gives plausible answers,but its operation is not associative.Changing the grouping of a series of Ž.rules the order in which they fire would change the final mass assignments.In addition,assigning a large mass to the base set,or environment,paves the way for another problem.The next rule which fired would be in a position to establish certainty,prematurely. Because of these flaws,we will not consider further the alternative of assigning conflicting mass to the base set.The other alternatives are both associative and commutative in their operations,as is Dempster’s combination rule.In the next sections,we examine problems associ-ated with multiple rule combinations and tabulate the solutions provided by the proposed normalization()C.K.Murphy r Decision Support Systems2920001–9 6alternatives.These problems have not been described in the literature.The first problem is the loss of a majority opinion because of a single dissenting rule which assigns no plausibility to the majority’s sub-set.The table of alternative solutions includes the mass in the null set so that we can judge its effec-tiveness in warning of possible errors.3.1.Loss of the majority opinion in multiple combi-nationsThe combination rule works when the combining rules are in substantial agreement,as in the example from Table1.Suppose we combine the result with another rule,rule3which assigns all of its mass to the minority sets:0.50to B and0.50to C.The result of the combination,as well as the results produced by alternative methods,is shown in Table2.The combined mass in Table2shows no mass in Ä4A which contained the majority of the mass in the previous two rules.Moreover,no amount of corrobo-rating evidence can resurrect the belief in a majority opinion if the system ever uses a rule which assigns all mass to sets which contradict the majority.The system in Table3assigns68%of its belief to C yet its average plausibility is only40%.In Table2,the amount of mass in the null set before normalization,0.875,warns that the new evi-dence is in conflict with an established pattern; however,the upper and lower bounds are not useful in this example.The lower bound is near zero for all the subsets.The minimum combining mass would be a more informative lower bound for the major ele-ments than Dempster’s rule without normalization. In contrast,the weighted average describes the distri-butions;and its disagreement with the normalized masses also functions as a warning.The average shows that the accumulated evidence does not justify convergence to any conclusion.3.2.Attaining certainty in multiple combinationsThe converse of the loss of a majority opinion is the rapid movement toward certainty when a body of conflicting evidence is overridden by a single piece of strongly confirming evidence,as illustrated in Table3.Suppose that a series of four rules have divided belief between A and B equally;however, the next combining rule confirms A to0.9.Ä4The certainty expressed for A in rule5increases in the combination despite the presence of50% disconfirming evidence in the previous four rules. The failure of belief functions to weight conclusions by the number of contributing rules becomes a prob-lem when most of the rules divide belief fairly equally among competing propositions.Although the conclusion may be correct,the decision maker de-serves a warning that the level of certainty is not so high as the combined mass indicates.Whether this turns out to be a problem depends on which rule is representative of the evidence.With only45%con-flict,this example demonstrates the difficulty of establishing a warning level for the mass in the null set that would be appropriate for all situations.The weighted average gives a clearer picture of the rules’Ä4 disagreement.The average mass in A is high Ž.enough58%to motivate the question of how to reach a conclusion based on averages.The final line in Table3,discussed in the next section,addresses that question.3.3.AchieÕing certainty with aÕeragesAveraging provides an accurate record of con-tributing beliefs,but it lacks convergence.Unlike Dempster’s rule,it does not increase the measure of belief in the dominant subset.To provide conver-gence with an averaging method,we insert the aver-age values of the masses in the combining rule.IfŽ. there are n rules,or pieces of evidence,use Eq.1 to combine the weighted averages of the masses n y1times.We can,thus,avoid overdependence on a single piece of conflicting evidence,such as one causing the disappearance of the majority opinion. This use of the average in Table3leads to a combined belief in A of0.856,compared with0.91 from the individual application of the rules.In gen-eral,combining an average gives a less extreme answer.Others have noted that combining belief functions leads to a paradox—the greater is the conflict between pieces of evidence,the greater is the‘‘certainty’’on ing an average value reduces this effect.The calculation in Table3assumes that rule1 represents five rules.If rule1were a single rule,one combination of the average masses from rules1andŽ5would yield a mass of0.861for A also less than()C.K.Murphy r Decision Support Systems2920001–97.the original0.91.This result is typical of the com-bining rule:combining two identical rules with0.7Ä4mass in A yields a lower value than combining two rules with the differing masses of0.5and0.9.This property,in itself,may constitute another incongruity of the normalization process.3.4.SolÕing a classic problemIn this section,we tabulate the results of applying the D-S combining rule and the proposed normaliza-tion alternatives to a problem involving only twow xrules:Zadeh’s20example of100%certainty as-signed to a minority belief.Ä4Ä4Ä4The headings AB,AC,and BC in Table4 refer to unions of two singleton sets;these headings were included to illustrate the union operator.The mass in the null set before normalization is under-lined to highlight its importance as a warning.All of the alternative approaches are adequate for this extreme example of a normalization problem. The alternative of using the average and the alterna-tive of assigning the mass to the union also furnish a record of the conflicting evidence.The latter opera-Ä4Ä4tion gives an upper bound for A and B equal to their maximum mass in the two combining rules;Ä4however,the upper bound for C,0.19,is higher than its maximum,0.1.Before normalization,the mass in the null set is0.99,a definite warning of a false conclusion,given that the combined mass does not reflect the divergence in the combining distribu-tions.The equal division between A and B in the average shows that the combining evidence does not yield a definite conclusion.3.5.Unearned beliefw xVoorbraak15identified a weakness in a feature usually considered to be a strength of belief func-tions:probability assigned to a set is not divided among its elements but remains with all the ele-ments.In combination with other evidence,this can result in an element of the multi-element set receiv-ing a larger belief than seems justified.He provides this example.Ä4Ä4ÄRule1:A s0.5s B or C;Rule2:A or 4Ä4B s0.5s C.Ä4Ä4 Result after combining rules1and2:A s B s Ä4C s1r3.The equal distribution of belief is counterintuitiveÄ4Ä4because both A and C had individually assignedÄ4Ä4 mass,as well as a share with B;but B had only two shared masses.Because the problem occurs in the intersection operation,omitting the normalization step doesn’t change the relative assignments.However,averaging the masses yields:m A s m C s m A or B s m B or C s0.25.Ž.Ž.Ž.Ž. Averaging followed by D-S combination gives:m A s m C s0.3,m B s0.2,Ž.Ž.Ž.m AB s m BC s0.1.Ž.Ž.Ä4Ä4 This result assigns a higher mass to A and C than Ä4to B although it assigns the same plausibility,0.4,Ä4Ä4Ä4to A,B,and C.The iterative assignment method converges to Ž.Ž.Ž.m A s m C s0.5,m B s0.0,given the supportŽ.w xŽ.Ž. pairs for the rules:S A s0.5,0.5s S C;S B s w x0.0,0.5.Both this method and averaging achieve a satisfactory solution to this problem.4.Evaluation of normalization alternativesAll of the normalization alternatives avoid the counter-intuitive results produced by Dempster’s rule when conflicting evidence is present;however,aver-aging lacks correspondence with Bayesian condition-ing.In addition,each of the alternative approaches has other drawbacks.Mass in the null set warns of problems;however, setting an appropriate level is problematic.We haveŽ.seen an example Table3where a warning is indicated,but the mass in the null set is only45%. One of the combinations in Shafer and Tversky’s w x12anthropological example contains75%mass in the null set;however,the final result is compatible with the evidence.At extreme levels,greater than 90%,null mass is a reliable indicator of problems.The method,that does not normalize,allocates a decreasing mass to the focal elements in successive ing the intersection operator with-out normalization to represent a lower bound on the()C.K.Murphy r Decision Support Systems2920001–9 8belief results in a belief which approaches zero as the number of combinations increases.Coupling this with an upper bound supplied by the union operator results in a constantly increasing gap between belief and plausibility as combinations continue.Thus,there is the paradox that as evidence accumulates,igno-rance increases.As the number of combinations increases,the belief assigned to ignorance ap-proaches one.Upper and lower bounds,calculated in this manner,provide little information about the more weighty focal elements.The maximum and minimum functions are more informative bounds.Similarly,if mass were allowed in the null set with the assumption that there are unknown proposi-tions,these unknown elements would accumulate an increasingly larger share of the belief as evidence combination proceeded.This would occur even if the evidence supported one or more of the known propo-sitions.The average offers an account of the evidence; therefore,it is useful as a reference even when it is not the primary combination method.Substantial differences between the average and the combined belief indicate problems in the combination.The averaging method does not converge to a conclusion; however,we can incorporate convergence by using the combining rule repeatedly with the average val-ues.Moreover,in the averaging process,the decision maker can easily weight evidence by its perceived importance or reliability.Of the proposed methods, only averaging preserves a record of uncertainty Ž.mass assigned to the base set and the relative frequencies of beliefs.From the alternatives,we can construct two prac-tical methods.Ž.1If all the evidence is available at the same time,average the masses,and calculate the combined masses by combining the average values multiple times.Ž.2If decisions are made as evidence accumu-lates,using Dempster’s rule with the individual evi-dential masses is simpler than recomputing averages.Ž.Use guidelines based on either a mass in the null Ž.set or b average masses to warn of possible prob-lems caused by conflicting evidence.A guideline based on average masses summarizes the evidence and is more informative than one based on mass in the null set.5.ConclusionWhen conflicting evidence is present,Dempster’s rule for combining beliefs often produces results that do not reflect the actual distribution of beliefs.Three types of problems were previously identified;two other types are associated with the successive combi-Ž.nation of rules:1a single rule forcing certainty and Ž.2a single rule overruling a majority opinion.Of the alternative methods that address the problems, averaging identifies combination problems,shows the distribution of belief,and preserves a record of Ž.ignorance unassigned belief.In selecting an uncertainty management system for an expert system,two characteristics to avoid are:Ž.1assigning misleading certainty to a recommenda-Ž.tion and2failing to attain certainty when it is justified.Analysis of examples in four problem areas showed that the method of averaging avoids the first error and accounts for all combining rules;moreover, an averaging system can intensify belief in the domi-nant subset by using average values in the combining rule.Averaging also easily incorporates the assign-ment of higher weights to more reliable evidence.Using actual belief functions,rather than averages in the combining rule offers correspondence to Bayesian theory,as well as convergence.This method is computationally simpler when making decisions with incremental evidence.In this situation,Demp-ster–Shafer combination is recommended as the pri-mary uncertainty management system,accompanied by weighted averages to track the accumulated mass and warn of possible errors.Referencesw x1J.F.Baldwin,A calculus for mass assignments in evidentialŽ.reasoning,in:R.Yager,J.Kacprzyk,M.Fedrizzi Eds., Advances in the Dempster–Shafer Theory of Evidence,Wi-ley,New York,1994,pp.513–531.w x2J.F.Baldwin,Combining evidences for evidential reasoning,Ž.International Journal of Intelligent Systems61991569–616.w x3 B.J.Buchanan,E.H.Shortcliffe,A model of inexact reason-Ž.ing in medicine,Mathematical Biosciences231975351–379.w x4M.R.B.Clarke,Discussion of belief functions,by P.Smets,Ž.in:P.Smets, A.Mamdani, D.Dubois,H.Prade Eds.,。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

CHAPTER 7BELIEF FUNCTIONSINTRODUCTIONbyGlen n ShaferThe theory of belief functions provides a non-Bayesian way of using mathematical probability to quantify subjective judgements. Whereas a Bayesian assesses probabilities directly for the answer to a question of interest, a belief-function user assesses probabilities for related questions and then considers the implications of these probabilities for the question of interest.Though antecedents for belief functions can be found in the seventeenth and eighteenth centuries, the theory in its present form is due to the work of A.P. Dempster in the 1960s and my own work in the 1970s. For this reason, it is sometimes called the “Dempster-Shafer theory.” It came to the attention of workers in artificial intelligence in the 1980s, in part because of its resemblance to the less systematic calculus of certainty factors developed for MYCIN by Edward Shortliffe and Bruce Buchanan (see Chapter 5).My 1976 book, A Mathematical Theory of Evidence, remains the most comprehensive single reference for the mathematical theory of belief functions, but it has been followed by a large literature on interpretation, application, and computation. This chapter presents a selection from that literature.The first two articles in the chapter represent recent thinking by Dempster, myself, and our co-workers. The article by Rajendra Srivastava and me introduces the belief-function formalism and compares it with the Bayesian formalism in the context of an important application, financial auditing. The article by Dempster and Augustine Kong illustrates how belief functions can be used in belief networks.The third and fourth articles, the article by Jean Gordon and Edward Shortliffe, and the article by Judea Pearl, provide alternative introductions to belief functions. Gordon and Shortliffe present the theory in the context of a medical application, and they compare it with the framework and rules of combination of MYCIN. Pearl's article takes a more critical look at belief functions.The fifth article, by Prakash Shenoy and me, extends to belief functions the methods of local computation in belief networks that are presented for Bayesian networks in Chapter 6. Shenoy and I show that these methods apply to a broad class of formalisms, including the Bayesian formalism, belief functions, Wolfgang Spohn's natural conditional functions, and any other formalism that has operations of combination and marginalization that satisfy certain simple axioms.The last two articles describe consultation systems that help users represent and combine belief functions. The article by John Lowrance, Thomas Garvey, and Thomas Strat describes an early version of the Gister system, which helps users build up belief networks graphically. This article is significant not only because of Gister itself, but also because Lowrance and his co-workers at SRI International were influential in bringing belief functions into artificial intelligence. The article by Debra Zarley, Yen-Teh Hsia, and me describes DELIEF, a system developed at the University of Kansas. This system goes beyond the early versions of Gister by using the mathematics of belief networks to carry out the combination automatically.This introduction surveys some of the issues in the belief-function literature and provides references for further reading. For a fuller survey of current issues, see Shafer (1990).1. The Basics of Be lie f-Function The oryThe theory of belief functions is based on two ideas: the idea of obtaining degrees of belief for one question from subjective probabilities for a related question, and Dempster's rule for combining such degrees of belief when they are based on independent items of evidence.These ideas are illustrated by example in several of the articles in this chapter. The simplest, perhaps, is the example of testimony discussed by Srivastava and me. We can derive degrees of belief for statements made by witnesses from subjective probabilities for the reliability of these witnesses.Degrees of belief obtained in this way differ from probabilities in that they may fail to add to 100%. Suppose, for example, that Betty tells me a tree limb fell on my car. My subjective probability that Betty is reliable is 90%; my subjective probability that she is unreliable is 10%. Since they are probabilities, these numbers add to 100%. But Betty's statement, which must be true if she is reliable, is not necessarily false if she is unreliable. From her testimony alone, I can justify a 90% degree of belief that a limb fell on my car, but only a 0% (not 10%) degree of belief that no limb fell on my car. (This 0% does not mean that I am sure that no limb fell on my car, as a 0% probability would; it merely means that Betty's testimony gives me no reason to believe that no limb fell on my car.) The 90% and the 0%, which do not add to 100%, together constitute a “belief function.”In this example, we are dealing with a question that has only two answers (Did a limb fall on my car? Yes or no.). Belief functions can also be derived for questions for which there are more than two answers. In this case, we will have a degree of belief for each answer and for each set of answers. If the number of answers (or the size of the “frame”) is large, the belief function may be very complex.Dempster's rule is based on the standard idea of probabilistic independence, applied to the questions for which we have subjective probabilities. I can use the rule to combine evidence from two witnesses if I consider the first witness's reliability subjectively independent (before I take account of what the witnesses say) of the second's reliability. (This means that finding out whether one witness is reliable would not change my subjective probability for whether the other is reliable.) The rule uses this subjective independence to determine joint probabilities for various possibilities as to which of the two are reliable.Though it begins with an initial judgment of independence, Dempster's rule goes beyond this independence. After using independence to compute joint probabilities for who is reliable, I must check whether some possibilities are ruled out by what the witnesses say. (If Betty says that a tree limb fell on my car, and Sally says nothing fell on my car, then they cannot both be reliable.) If so, I renormalize the probabilities of the remaining possibilities so they add to one. This is an example of probabilistic conditioning, and it may destroy the initial independence. (After I notice that Betty and Sally have contradicted each other, their reliabilities are no longer subjectively independent for me. Now finding out that one is reliable would tell me that the other is not.) Then I determine what each possibility for the reliabilities implies about the truth of what the witnesses said, and I use the renormalized probabilities to get new degrees of belief.The net effect of Dempster's rule is that concordant items of evidence reinforce each other (two independent witnesses for a limb falling on my car make me believe it more than either alone), conflicting items of evidence erode each other, and a chain of reasoning is weaker than its weakest link. Section 2.3 of the article by Srivastava and me illustrates all these points by example.Bayesian probability measures qualify as belief functions; they represent, in effect, the special case where the unreliable witness always lies. Another special case is that of categorical, completely non-probabilistic information. This is the case where we are 100% confident in the reliability of a witness or other evidence, yet this evidence does not completely answer our question. I might have conclusive evidence, for example, that I lost my wallet in one of three places, without any clue as to which one. This calls for a belief function that assigns a degree of belief of 100% to the three places as a set, but a degree of belief of 0% to each of the three individually.As Dempster and Kong emphasize in their article in this chapter, the ability to represent both probabilistic and categorical information makes belief functions a bridge between Bayesian and categorical reasoning. Bayesian conditioning itself can be understood in terms of this bridge. Conditioning a Bayesian probability measure on given information is equivalent to combining it, by Dempster's rule, with a categorical belief function representing that information (Shafer 1976, pp. 66-67).2. Belief Functions Do Not Express Lower Bounds on True but Unknown ProbabilitiesMathematically, the degrees of belief given by a single belief function can be related to lower bounds on probabilities, but conceptually they must be sharply distinguished from such lower bounds. If we make up numbers by thinking of them as lower bounds on true probabilities, and we then combine these numbers by Dempster's rule, we are likely to obtain erroneous and misleading results.It is easy to see the temptation to interpret belief-function degrees of belief as lower bounds on unknown true probabilities. Consider again my 90% belief that a limb fell on my car, and my 0% belief that no limb fell on my car. These degrees of belief were derived from my 90% and 10%subjective probabilities for Betty being reliable or unreliable. Suppose these subjective probabilities were based on my knowledge of the frequency with which witnesses like Betty are reliable. Then I might think that the 10% of witnesses like Betty who are not reliable make true statements a definite (though unknown) proportion of the time and false statements the rest of the time. Were this the case, I could think in terms of a large population of statements made by witnesses like Betty. In this population, 90% of the statements would be true statements by reliable witnesses, x% would be true statements by unreliable witnesses, and (10-x)% would be false statements by unreliable witnesses, where x is an unknown number between 0 and 10. The total chance of getting a true statement from this population would be (90+x)%, and the total chance of getting a false statement would be(10-x)%. My degrees of belief of 90% and 0% are lower bounds on these chances; since x is anything between 0 and 10, 90% is the lower bound for (90+x)%, and 0% is the lower bound for (10-x)%.As this example suggests, there is a sense in which a single belief function can always be interpreted as a consistent system of probability bounds. It is always possible to find a probability distribution such that each probability is greater than the corresponding degree of belief given by the belief function.The fallaciousness of the probability-bound interpretation of belief functions becomes clear, however, when we consider two or more belief functions addressing the same question but representing different and possibly conflicting items of evidence. The disagreements that such belief functions represent are not disagreements about the values of true probabilities. When Betty says a limb fell on my car, and Sally says nothing fell on my car, they are disagreeing about whether something fell on my car, not about the true probability of something having fallen on my car.Were we to insist on a probability-bound interpretation of belief functions, then we would only be interested in groups of belief functions whose degrees of belief, when interpreted as probability bounds, can be satisfied simultaneously. When belief functions are given their proper interpretation, however, it is of no particular significance whether there exist probabilities that simultaneously satisfy the bounds defined by a whole group of belief functions. Consider two cases that might arise when we use belief functions to represent contradictory evidence from Betty and Sally:Case 1. Before hearing their testimony, we think highly of the reliability of both Betty and Sally. We represent Betty's evidence by a belief function that gives a95% degree of belief to a limb having fallen on my car, and we represent Sally'sevidence by a belief function that gives a 95% degree of belief to nothing havingfallen on my car. In this case, the two belief functions are contradictory asprobability bounds; if the true probability of a limb having fallen on my car is greaterthan 95%, then the true probability of nothing having fallen on my car cannot also begreater than 95%.Case 2. Before hearing their testimony, we think that both Betty and Sally are fairly unreliable. So in both belief functions, we assign a 35% degree of belief ratherthan a 95% degree of belief. In this case, the two belief functions define consistentprobability bounds; the true probability of a limb having fallen on my car and ofnothing having fallen on my car can both be greater than 35%.From the belief-function point of view, there is no conceptual difference between these two cases. In both cases, we can combine the two belief functions by Dempster's rule. In both cases, there is conflict in the evidence being combined, and normalization is required.It can be shown that if no renormalization is required in the combination of a group of belief functions by Dempster's rule, then there do exist consistent probabilities that simultaneously bound all the belief functions being combined as well as the belief function that results from the combination. We cannot count on this, however, when renormalization is required. Consequently, authors who favor a probability-bound interpretation of belief functions are uncomfortable with renormalization (see, e.g., Zadeh 1986).Probability bounds provide the basis for yet another mathematical theory of evidence, which I have called the “theory of lower probability” (Shafer 1981). In this theory, an analogy is drawn between actual evidence and knowledge of bounds on unknown true probabilities for the question of interest. This theory is not always useful, because unknown true probabilities exist only if a population and sampling scheme are well defined. An unknown true probability for the truth of Betty's statement, for example, exists only if a the population of true and false statements of witnesses like Betty is well-defined. In a problem where a reference population for the question of interest is well-defined, the theory of lower probability may be more useful than the theory of belief functions. But in other problems belief functions may be more useful.As Shafer and Tversky explain in our article in Chapter 2, both Bayesian and belief-function arguments involve analogies to sampling situations. But belief-function analogies are less complete than Bayesian analogies. They are useful when it is reasonable to evaluate certain evidence (e.g., Betty's reputation) using a sampling analogy, but this evidence will not support extending the analogy to all potentially relevant issues (what Betty would say were my good impression of her erroneous, how often limbs fall from that tree, etc.).There has been some confusion about the original relation between belief functions and probability bounds, because some of Dempster's early articles hinted at a probability-bound interpretation. But Dempster's “upper and lower probabilities” were not derived by bounding unknown true probabilities. My 1976 book, in which the term “belief function” was introduced, explicitly disavowed any probability-bound interpretation (p. ix). This disavowal was elaborated at length in Shafer (1981) and seconded by Dempster (1982).3. General Metaphors and Canonical ExamplesThe metaphor of the witness who may or may not be reliable can serve as a standard of comparison, or canonical example, for judging the strength of other evidence. We can assess given evidence by saying that it is comparable in strength to the evidence of a witness who has a certain chance of being reliable.A witness testifying to a specific proposition leads to a relatively simple belief function—one that gives a specified degree of belief to that proposition and its consequences, and zero degree of belief to all other propositions. Arbitrarily complex belief functions can be built up by combining such simple belief functions (Shafer 1976, p. 200), but in some cases we may want to produce complex belief functions more directly, in order to represent evidence that conveys a complex or mixed message but cannot be broken down into independent components. This requires more complex metaphors or canonical examples.Two distinct general metaphors have been suggested. Shafer (1981) suggests the metaphor of a randomly coded message. Pearl (1988) suggests the metaphor of random switches.Shafer's randomly coded messages are explained in detail in the article by Shafer and Tversky in Chapter 2. In this metaphor, we have probabilities for which of several codes was used to encode a message. We do not yet know what the message says, but we know it is true. We have this message in hand in its coded form, and we will try to decode it using each code, but the probabilities are judgments we make before this decoding. When we do decode using the different codes, we sometimes get nonsense, and we sometimes get a comprehensible statement. It seems sensible, in this situation, to condition our probabilities for the codes by eliminating the ones with which we get nonsense. The conditioned probability for each remaining code can then be associated with the statement we get by decoding using that code. These statements may be related in various ways; some may be inconsistent with each other, and some may be stronger than others. Thus we obtain the complexity of an arbitrary belief function.In this metaphor, the independence of two belief functions means that two different people independently choose codes with which to send two possibly different (though both true) messages. Our uncertainties about the codes in the two cases remain independent unless possible codes implycontradictory messages. If s1 is a possible code for the first person, and s2is a possible code for thesecond person, and the first message as decoded by s1contradicts the second message as decoded by s2, then it cannot be true that these were the two codes used. We eliminate such pairs of codes and renormalize the probabilities of the remaining possible pairs. The probability of each pair is then associated with the conjunction of the two implied messages. This is Dempster's rule.The metaphor can be presented in a way that forestalls the interpretation of belief-function degrees of belief in terms of bounds on probabilities. There is no probability model for the choice of the true message sent. The probabilities are only for the choice of codes. We might visualize these probabilities in terms of a repetition of the choice of codes, but since the true message can vary arbitrarily over this population of repetitions, the idea of this population does not lead to the idea of a true unknown probability for the true message or for the true answer to the question of interest. It leads only to an argument about what the true message says or implies—an argument whose strength can be represented in terms of the derived “degrees of belief.”Pearl's metaphor of random switches is explained in detail in his article in this chapter. In this metaphor, a switch oscillates randomly among propositions about the question of interest. Ourprobabilities are probabilities for the position of the switch, but when the switch points to a certain proposition, this indicates that the proposition is to be adopted as an assumption or axiom for further reasoning. A given proposition about the question of interest may be implied by several of these possible axioms, and its total degree of belief will be the total probability that an axiom implying it is adopted—the total probability, to speak more briefly, that it is proven.Pearl's metaphor seems well fitted for computer science, since it mixes the language of electrical engineering with that of symbolic logic. I find it difficult, however, to construe the metaphor in a way that completely avoids interpretation in terms of bounds on true probabilities. The random-code metaphor allows us to interpret the probabilities for a related question in terms of a population of repetitions completely unconnected with the question of interest. But since the switch positions in Pearl's metaphor are defined in terms of axioms about the question of interest, it seems to me that each repetition of the random selection of switch positions will generally constrain the true answer to the question of interest. Thus the probabilities do bear directly on the question of interest, and this leads to the objections to renormalization I discussed in the preceding section. In Pearl's view, however, his metaphor is compatible with renormalization (section 5.2 of his article in this section).4. Sorting Evide nce into Inde pe nde nt Ite msDempster's rule should be used to combine belief functions that represent independent items of evidence. But when are items of evidence independent? How can we tell? These are probably the questions asked most frequently about belief functions.The independence required by Dempster's rule is simply probabilistic independence, applied to the questions for which we have probabilities, rather than directly to the question of interest. In the metaphor of the randomly coded messages, this means that the codes are selected independently. In the more specialized metaphor of independent witnesses, it means that the witnesses (or at least their current properties as witnesses) are selected independently from well-defined populations.Whether two items of evidence are independent in a real problem is a subjective judgment, in the belief-function as in the Bayesian approach. There is no objective test.In practice, our task is to sort out the uncertainties in our evidence. When items of evidence are not subjectively independent, we can generally identify what uncertainties they have in common, thus arriving at a larger collection of items of evidence that are subjectively independent. Typically, this maneuver has a cost—it forces us to refine, or make more detailed, the frame over which our belief functions are defined.We can illustrate this by adapting an example from Pearl's “Bayes Decision Methods,” in Chapter 6. Suppose my neighbor Mr. Watson calls me at my office to say he has heard my burglar alarm. In order to assess this testimony in belief-function terms, I assess probabilities for the frameS= {Watson is reliable, Watson is not reliable}.1Here Watson being reliable means he is honest and he can tell whether it is my burglar alarm he is hearing. I can use these probabilities to get degrees of belief for the frameT = {My alarm sounded, My alarm did not sound}.Putting a probability of 90%, say, on Watson being reliable, I get a 90% degree of belief that my burglar alarm sounded, and a 0% degree of belief that my burglar alarm did not sound.I now call another neighbor, Mrs. Gibbons, who verifies that my alarm sounded. I can assess her testimony in the same way, by assessing probabilities for the frame= {Gibbons is reliable, Gibbons is not reliable}.S2Suppose I also put a probability of 95% on Gibbons being reliable, so that I again obtain a 95% degree of belief that my burglar alarm sounded, and a 0% degree of belief that it did not sound.Were I to combine these two belief functions by Dempster's rule, I would obtain an overall degree of belief of 99.5% that my burglar alarm sounded. This is inappropriate, however, for the two items of evidence involve a common uncertainty—whether there might have been some other noise similar to my burglar alarm.In order to deal with this problem, I must pull my skepticism about the possibility of a similar noise out of my assessment of Watson's and Gibbons' reliability, and identify my grounds for this skepticism as a separate item of evidence. So I now have three items of evidence—my evidence for Watson's honesty (I say honesty now instead of reliability, since I am not including here the judgment that there are no other potential noises in the neighborhood that Watson might confuse with my burglar alarm), my evidence for Gibbons' honesty, and my evidence that there are no potential noises in the neighborhood that sound like my burglar alarm.These three items of evidence are now independent, but their combination involves more than theframe T. In its place, we need the frame U = {u1,u2,u3}, whereu1= My alarm sounded,u2= There was a similar noise,u3= There was no noise.(Let us exclude, for simplicity of exposition, the possibility that there were two noises, my alarm and also a similar noise.) My first two items of evidence (my evidence for Watson's and Gibbons'honesty) both provide a high degree of belief in {u1,u2}, while the third item (my evidence against theexistence of other noise sources) provides a high degree of belief in {u1,u3}. Combining the three byDempster's rule produces a high degree of belief in {u1}.A Bayesian approach to this problem would be somewhat different, but it too would involve refining the frame T to U or something similar. In the Bayesian case, we would ask whether the events “Watson says he heard a burglar alarm” and “Gibbons says she heard a burglar alarm” are subjectively independent. They are not unconditionally independent, but they are independent conditional on a specification of what noise actually occurred. I can exploit this conditional independence in assessing my subjective probabilities, but in order to do so, I must bring the possibility of other noises into the frame.In the belief-function approach, one talks not about conditional independence of propositions, but rather about the overlapping and interaction of evidence. For further explanation and more examples, see Shafer (1976, Chapter 8), Shafer (1981), Shafer (1984), Shafer (1987) and Srivastava, Shenoy, and Shafer (1989).5. Fre que ncy ThinkingA common but dangerous temptation is to use Dempster's rule to combine opinions that are really fragments of information about a single probability distribution. This is usually inappropriate and can give misleading results.Suppose we are concerned with a bird Tweety. We want to know whether Tweety flies and whether Tweety is a penguin. We decide to make judgments about this by thinking of Tweety as randomly selected from a certain population of birds. We have guesses about the proportion of birds in this population that fly and the proportion that are penguins. Should these guesses be represented as belief functions over a set of statements about Tweety, and then combined by Dempster's rule? No. Both guesses bear on the particular bird only through their accuracy as guesses about the population. This means that they have in common the uncertainty involved in choosing Tweety at random from the population. Depending on how we obtained the guesses, they may also have other uncertainties in common.Like every problem of dependence, we can deal with this problem within the belief-function approach by sorting out the uncertainties and properly refining our frame. In this case, we must bring the possible values for the population frequencies into the frame. We can then formalize the connection between these frequencies and Tweety as one of our items of evidence. We must also identify our sources of evidence about the frequencies, sort our their uncertainties, and use them to assess belief functions about what these frequencies are.This brings us into the difficult realm of statistical inference. Belief functions provide only one of many approaches to statistical inference, and even the possible belief-function techniques are varied and complex (Dempster 1966, 1967a,b, 1968a,b, 1969; Chapter 11 of Shafer 1976; Shafer 1982a,b). No statistical approach is likely to be of much value unless considerable frequency information is available.An alternative is the Bayesian approach, in which (according to the constructive interpretation) we compare our hunches about Tweety to fairly precise knowledge of frequencies in a population. One of the messages of Pearl's article on probabilistic semantics in Chapter 9, with which I agree, is that many of the intuitions discussed in the literature on non-monotonic logic are best represented in this way, in spite of the authors' protestations that their intuitions are not probabilistic.In section 7 of his article in this chapter, Pearl gives a number of examples in which Dempster's rule gives inappropriate results. I believe the intuitions in most of these examples are based on frequency thinking, and I agree that Dempster's rule is not appropriate for such problems. If there is sufficient evidence on which to base detailed probability judgments in analogy to frequency knowledge, then the Bayesian approach will be far more useful. In some other cases, a lower。