哈工大深圳机器学习_丁宇新_第六章答案

合集下载

机器学习参考答案

第7章机器学习参考答案7-6ID3解：设根节点为S,尽管它包含了所有的训练例子，但却没有包含任何分类信息，因此具有最大的信息熵。

即：H(S)= - (P(+)log 2 P(+) + P(-)log2 P(-))式中P(+)=3/6，P(-)=3/6分别是决策方案为“ +”或“-”时的概率。

因此有H(S)= - ((3/6)log 2(3/6) + (3/6)log 2(3/6))=1按照ID3算法，需要选择一个能使S的期望熵为最小的一个属性对根节点进行扩展，因此我们需要先计算S关于每个属性的条件熵：H(S|x i)= ( |S T| / |S|)* H(S T) + ( |S F| / |S|)* H(S F)其中，T和F为属性人的属性值，S T和S F分别为X j=T或X i=F时的例子集，|S、| S T|和|S F|分别为例子集S、S T和S F的大小。

下面先计算S关于属性X1的条件熵：在本题中，当x1=T时，有：S T={1，2，3}当X1=F时，有：S F={4，5, 6}其中，S T和S F中的数字均为例子集S中的各个例子的序号，且有|S|=6, | S T |=| S F |=3。

由S T可知，其决策方案为“ +”或“-”的概率分别是：P ST(+)=2/3P ST (-)=1/3因此有：H(S T)= - (P ST (+)log2 P ST (+) + P ST (-)log2 P ST (-))=-((2⑶log 2(2/3) + (1/3)log 2(1/3))=0.9183再由S F可知，其决策方案为“ +”或“-”的概率分别是：P SF (+)=1/3P SF (-)=2/3则有：H (S F)= - (P SF (+)log 2 P SF (+) + P SF (-)log 2 P SF (-))=-((1 ⑶log 2(1/3)+ (2/3)log 2(2/3))=0.9183将H(S T)和H (S F)代入条件熵公式，有：H(S|X1)=(|S T|/|S|)H(S T)+(|S F|/|S|)H(S F)=(3/6) * 0.9183 + (3/6)* 0.9183=0.9183下面再计算S关于属性X2的条件熵：在本题中，当X2=T时，有：S T={1 , 2, 5, 6}当x2=F时，有：S F={3 , 4}其中，S T和S F中的数字均为例子集S中的各个例子的序号，且有|S|=6, | S T |=4, | S F |=2。

机器学习模拟题与参考答案

机器学习模拟题与参考答案一、单选题（共114题，每题1分，共114分）1.机器学习这个术语是由( )定义的？A、Arthur SamuelB、Guido van RossumC、James GoslingD、以上都不是正确答案：A2.在一个线性回归问题中，我们使用 R 平方（R-Squared）来判断拟合度。

此时，如果增加一个特征，模型不变，则下面说法正确的是？A、如果 R-Squared 增加，则这个特征有意义B、如果R-Squared 减小，则这个特征没有意义C、仅看 R-Squared 单一变量，无法确定这个特征是否有意义。

D、以上说法都不对正确答案：C3.在SVM中, margin的含义是()A、损失误差B、间隔C、幅度D、差额正确答案：B4.下列哪种方法可以用来缓解过拟合的产生：( )。

A、正则化B、增加更多的特征C、以上都是D、增加模型的复杂度正确答案：A5.当数据分布不平衡时，我们可采取的措施不包括( )。

A、对数据分布较少的类别过采样B、对数据分布较多的类别欠采样C、对数据分布较少的类别赋予更大的权重D、对数据分布较多的类别赋予更大的权重正确答案：D6.同质集成中的个体学习器亦称（）A、异质学习器B、同质学习器C、基学习器D、组件学习器正确答案：C7.以下哪些是无序属性（）A、｛小，中，大｝B、闵可夫斯基距离C、｛飞机，火车、轮船｝D、｛1，2，3｝正确答案：C8.下列关于过拟合的说法错误的是A、过拟合是指模型在训练集上表现很好，但是在交叉验证集和测试集上表现一般B、解决过拟合可以采用Dropout方法C、解决过拟合可以采用参数正则化方法D、数据集扩增不能用来解决过拟合问题正确答案：D9.神经网络算法有时会出现过拟合的情况，那么采取以下哪些方法解决过拟合更为可行（）。

A、减少训练数据集中数据的数量B、增大学习的步长C、为参数选取多组初始值，分别训练，再选取一组作为最优值D、设置一个正则项减小模型的复杂度正确答案：D10.下列是机器学习中降维任务的准确描述的为A、依据某个准则对项目进行排序B、将其映射到低维空间来简化输入C、预测每个项目的实际值D、对数据对象进行分组正确答案：B11.对于在原空间中线性不可分问题，支持向量机（）。

哈工大深圳机器学习复习4_丁宇新

Machine LearningQuestion one:（举一个例子，比如:导航仪、西洋跳棋）Question two:Initilize: G={?,?,?,?,?,?} S={,,,,,}Step 1:G={?,?,?,?,?,?} S={sunny,warm,normal,strong,warm,same} Step2: coming one positive instance 2G={?,?,?,?,?,?} S={sunny,warm,?,strong,warm,same}Step3: coming one negative instance 3G=<Sunny,?,?,?,?,?> <?,warm,?,?,?,?> <?,?,?,?,?,same>S={sunny,warm,?,strong,warm,same}Step4: coming one positive instance 4S= { sunny,warm,?,strong,?,? }G=<Sunny,?,?,?,?,?> <?,warm,?,?,?,?>Question three:(a)Entropy(S)=og(3/5) og(2/5)= 0.971(b)Gain(S,sky) = Entropy(S) –[(4/5) Entropy(Ssunny) + (1/5) Entropy(Srainny)] = 0.322Gain(S,AirTemp) = Gain(S,wind) = Gain(S,sky) =0.322Gain(S,Humidity) = Gain(S,Forcast) = 0.02Gain(S,water) = 0.171Choose any feature of AirTemp, wind and sky as the top node.The decision tree as follow: (If choose sky as the top node)Question Four:Answer:Inductive bias: give some proor assumption for a target concept made by the learner to have a basis for classifying unseen instances.Suppose L is a machine learning algorithm and x is a set of training examples. L(xi, Dc) denotes the classification assigned to xi by L after training examples on Dc. Then the inductive bias is a minimal set of assertion B, given an arbitrary target concept C and set of trainingexamples Dc: (Xi ) [(B Dc Xi) -| L(xi, Dc)]C_E: the target concept is contained in the given gypothesis space H, and the training examples are all positive examples.ID3: a, small trees are preferred over larger trees.B, the trees that place high information gain attribute close to root are preferred over those that do not.BP:Smooth interpolation beteen data points.Question Five:Answer: In naïve bayes classification, we assump that all attributes are independent given the tatget value, while in bayes belif net, it specifes a set of conditional independence along with a set of probability distribution.Question Six：随即梯度下降算法Question Seven：朴素贝叶斯例子Answer:Question nine:Single-point crossover:Crossover mask: 11111100000 or 11111000000 or 1111 0000000 or 00001111111Two-point crossover:Offspring: (11001011000, 00101000101)Uniform crossover:Crossover mask: 10011110011 or 10001110011 or 01111101100 or 10000010011 or10011110001 01100001100Point mutation:Any mutation is ok!。

机器学习（慕课版）习题答案全集

机器学习（慕课版）习题答案全集机器学习（慕课版）习题答案目录第一章机器学习概述 (2)第二章机器学习基本方法 (5)第三章决策树与分类算法 (9)第四章聚类分析 (13)第五章文本分析 (17)第六章神经网络 (22)第七章贝叶斯网络 (26)第八章支持向量机 (31)第九章进化计算 (32)第十章分布式机器学习 (34)第十一章深度学习 (35)第十二章高级深度学习 (37)第十三章推荐系统 (39)第一章机器学习概述1.机器学习的发展历史上有哪些主要事件？机器学习发展分为知识推理期、知识工程期、浅层知识期和深度学习几个阶段，可从几个阶段选择主要历史事件作答。

2.机器学习有哪些主要的流派?它们分别有什么贡献？符号主义：专家系统、知识工程贝叶斯派：情感分类、自动驾驶、垃圾邮件过滤联结主义：神经网络进化主义：遗传算法行为类推主义3.讨论机器学习与人工智能的关系机器学习是人工智能的一个分支，作为人工智能核心技术和实现手段，通过机器学习的方法解决人工智能面对的问题4.讨论机器学习与数据挖掘的关系数据挖掘是从大量的业务数据中挖掘隐藏、有用的、正确的知识促进决策的执行。

数据挖掘的很多算法都来自于机器学习，并在实际应用中进行优化。

机器学习最近几年也逐渐跳出实验室，解决从实际的数据中学习模式，解决实际问题。

数据挖掘和机器学习的交集越来越大，机器学习成为数据挖掘的重要支撑技术5.讨论机器学习与数据科学、大数据分析等概念的关系数据科学主要包括两个方面：用数据的方法研究科学和用科学的方法研究数据。

前者包括生物信息学、天体信息学、数字地球等领域；后者包括统计学、机器学习、数据挖掘、数据库等领域。

大数据分析即是后者的一个部分。

一般使用机器学习这个工具做大数据的分析工作，也就是说机器学习是我们做大数据分析的一个比较好用的工具，但是大数据分析的工具并不止机器学习，机器学习也并不只能做大数据分析。

人工智能机器学习技术章节习题及答案期末考试试卷题库及答案 (1)

第6章参考答案1,熵是表示随机变量不确定地度量,是对所有可能发生地事件产生地信息量地期望。

对于二分类来说,即一个随机事件有两种可能性：发生与不发生,设发生地概率为p,则不发生地概率为1-p,于是其熵地计算公式为：22H *log (1)*log (1)p p p p =----假如一个随机事件有三种可能地状态：状态1,状态2,状态3,其概率分别为P 1,P 2,P 3,则有P 1+P 2+P 3=1,于是它地熵地计算公式为：121222323H *log *log *log p p p p p p =---熵越大,不确定性就越大,熵越小,不确定性就越小（即确定性越大）。

极端地情况下,假如一个随机事件必然要发生,即其概率为1,那么它就没有不确定性了,此时熵最小为0.条件熵是给定某种条件地情况下重新计算地熵,假设给定地条件X 有状态1,状态2两种可能性,则其概率分别为P x1,P x2,则条件熵地计算公式为 1122H **x x x x x p H p H =+其中H x1,H x2分别为X 在状态1与状态2下计算地熵。

很明显,当给出一定地条件时,随机事件地不确定性减小,即条件熵要小于熵。

熵与条件熵地差就是信息增益。

信息增益表示地是当给出一定地条件时,事件地不确定性减少地程度。

2,采用ID3算法,先计算总地熵,再计算各个条件熵,然后计算出信息增益,选择信息增益最大地那个属性进行划分,然后再对各个子树进行同样地步骤划分。

参考决策树如图所示。

第6章第2题决策树3,贝叶斯概率公式为：有房子贷款有工作不贷款信贷情况年龄不贷款不贷款贷款是是否否好一般老年中、青年()(A |)(|)()()(|)或者 (|)()(|)i i i i i iP B P B P B A P A P B P A B P B A P B P A B ==∑ 先验概率是指根据以前经验与分析得到地概率;后验概率是在事情已经发生地情况下,要求这件事情发生地原因是由某个因素引起地可能性地大小,即后验概率是已经知道结果,来推断该结果是由何种原因引起。

哈工大深圳机器学习08ML试题答案

1 Give the definitions or your comprehensions of the following terms.(12’)1.1 The inductive learning hypothesisP171.2 OverfittingP491.4 Consistent learnerP1482 Give brief answers to the following questions.(15’)VS, In general what is the smallest number of queries may 2.2 If the size of a version space is ||be required by a concept learner using optimal query strategy to perfectly learn the target concept?P272.3 In genaral, decision trees represent a disjunction of conjunctions of constrains on the attributevalues of instanse,then what expression does the following decision tree corresponds to ?3 Give the explaination to inductive bias, and list inductive bias of CANDIDATE-ELIMINATION algorithm, decision tree learning(ID3), BACKPROPAGATION algorithm.(10’)4How to solve overfitting in decision tree and neural network?(10’)Solution:●Decision tree:◆及早停止树增长(stop growing earlier)◆后修剪法(post-pruning)●Neural Network◆ 权值衰减(weight decay)◆ 验证数据集(validation set)5 Prove that the LMS weight update rule ^(()())i i train i V b V b x ωωη←+-performs a gradient descent to minimize the squared error. In particular, define the squared error E as in the text. Now calculate the derivative of E with respect to the weight i ω, assuming that ^()V b is a linear function as defined in the text. Gradient descent is achieved by updating each weight in proportion to iE ω∂-∂. Therefore, you must show that the LMS training rule alters weights in this proportion for each training example it encounters. ( ^2,() (()())train train b V b training example E V b V b 〈〉∈≡-∑) (8’) Solution ： As Vtrain(b)←ˆV(Successor(b)) we can get E=2ˆ(()())train V b Vb -∑ 0112233445566ˆ()w +w x +w x +w x +w x +w x +w x V b = ˆˆ/2(()())(()())/i train train iE w V b V b V b V b w -∂∂=--∂-∂ =ˆ2(()())train iV b V b x -- As mentioned in LMS:ˆ(()())i i train i V b V b x ωωη←+- We can get (/)i i i E w ωωη←+-∂∂/2ηη'=-Therefore, gradient descent is achievement by updating each weight in proportion to /i E w -∂∂; LMS rules alters weights in this proportion for each training example it encounters.6 True or false: if decision tree D2 is an elaboration of tree D1, then D1 is more-general-than D2. Assume D1 and D2 are decision trees representing arbitrary boolean funcions, and that D2 is an elaboration of D1 if ID3 could extend D1 to D2. If true give a proof; if false, a counter example. (Definition: Let j h and k h be boolean-valued functions defined over X .then j h is more_general_than_or_equal_to k h (written j g k h h ≥) If and only if()[(()1)(()1)]k j x X h x h x ∀∈=→= then ()()j k j g k k g j h h h h h h >⇔≥∧≥) (10’) The hypothesis is false.One counter example is A XOR Bwhile if A!=B, training examples are all positive,while if A==B, training examples are all negative,then, using ID3 to extend D1, the new tree D2 will be equivalent to D1, i.e., D2 is equal to D1.7 Design a two-input perceptron that implements the boolean function A B ∧⌝.Design a two-layer network of perceptrons that implements A XOR B . (10’)8 Suppose that a hypothesis space containing three hypotheses, 1h ,2h ,3h , and the posterior probabilities of these typotheses given the training data are 0.4, 0.3 and 0.3 respectively. And if a new instance x is encountered, which is classified positive by 1h , but negative by 2h and 3h ,then give the result and detail classification course of Bayes optimal classifier.(10’)P1259 Suppose S is a collection of training-example days described by attributes including Humidity, which can have the values High or Normal. Assume S is a collection containing 10 examples,[7+,3-]. Of these 10 examples, suppose 3 of the positive and 2 of the negative examples have Humidity = High, and the remainder have Humidity = Normal. Please calculate the information gain due to sorting the original 10 examples by the attribute Humidity.( log 21=0, log 22=1, log 23=1.58, log 24=2, log 25=2.32, log 26=2.58, log 27=2.8, log 28=3, log 29=3.16, log 210=3.32, ) (5’)Solution ：(a)Here we denote S=[7+,3-],then Entropy([7+,3-])= 227733log log 10101010-- =0.886; (b)i v v values(Humidity )Gain(S,Humidity)=Entropy(S)-Entropy(S )v S S ∈∑Gain(S,a2)Values(Humidity )={High, Normal}∴{|()}High S s S Humidity s High =∈=223322Entropy()=-log -log 0.9725555High S =,5High S ==4 ∴ 224411Entropy()=-log -log 0.725555Normal S = ,Normal S =5 Thus Gain (S,Humidity)=0.886-55(0.972*0.72)1010⨯+=0.0410 Finish the following algorithm. (10’)(1) GRADIENT-DESCENT(training examples, η) Each training example is a pair of the form ,x t , where x is the vector of input values,and t is the target output value. η is the learning rate (e.g., 0.05).● Initialize each i ω to some small random value● Until the termination condition is met, Do● Initialize each i ω∆ to zero.● For each ,x t in training_examples, Do● Input the instance x to the unit and compute the output o● For each linear unit weight i ω, Do● For each linear unit weighti ω, Do i i i ωωω←+∆(2) FIND-S Algorithm● Initialize h to the most specific hypothesis in H● For each positive training instance x● For each attribute constraint a i in hIfThendo nothingElsereplace a i in h by the next more general constraint that is satisfied by x● Output hypothesis h。

智慧树机器学习(山东联盟)章节测验答案

解忧书店 JieYouBookshop第一章单元测试1【单选题】 (5分)对西瓜的成熟度进行预测得到结果为0.51，这属于（）学习任务。

A.其余选项都不是B.聚类C.回归D.分类2【单选题】 (5分)在学习过程中，X表示数据集，Y是所有标记的集合，也称为（）。

A.样本集合B.属性集合C.输出空间D.函数3【单选题】 (5分)机器学习算法在学习过程中可能获得多个不同的模型，在解决“什么样的模型更好”这一问题时遵循“若有多个假设与观察一致，则选最简单的那个”，即（）原则。

A.奥卡姆剃刀B.没有免费的午餐C.迪米特法则D.里氏替换4【单选题】 (5分)机器学习是整个人工智能的核心，机器学习算法的特征之一就是（）。

A.类别B.模型C.特征D.数据5【单选题】 (5分)模型的泛化能力是指A.适用于训练集样本的能力B.适用于测试集样本的能力C.适用于新样本的能力D.适用于验证集样本的能力6【多选题】 (5分)下列关于学习算法的说法正确的是A.在某些问题上表现好的学习算法，在另一些问题上却可能不尽人意B.学习算法自身的归纳偏好与问题是否相配通常并不起决定性的作用C.学习算法必须有某种偏好，才能产出它认为“正确”的模型D.要谈论算法的相对优劣，必须要针对具体的学习问题7【多选题】 (5分)获得假设（模型）空间时，从特殊到一般的过程属于A.泛化B.归纳C.特化D.演绎8【多选题】 (5分)机器学习可以应用在下列哪些领域（）A.搜索引擎B.天气预报C.商业营销D.自动驾驶汽车9【多选题】 (5分)根据训练数据是否拥有标记信息，学习任务可以分为（）。

A.分类B.聚类C.无监督D.回归E.半监督F.监督10【判断题】 (5分)演绎是从一般到特殊的"特化"过程，即从基础原理推演出具体状况A.对B.错11【判断题】 (5分)分类预测的是离散值A.错B.对12【判断题】 (5分)分类和回归是无监督学习A.对B.错13【判断题】 (5分)奥卡姆剃刀原则：即“若有多个假设与观察一致，选最简单的一个”。

人工智能课后习题第6章参考答案

第6章不确定性推理参考答案6.8 设有如下一组推理规则:r1: IF E1THEN E2 (0.6)r2: IF E2AND E3THEN E4 (0.7)r3: IF E4THEN H (0.8)r4: IF E5THEN H (0.9)且已知CF(E1)=0.5, CF(E3)=0.6, CF(E5)=0.7。

求CF(H)=?解：(1) 先由r1求CF(E2)CF(E2)=0.6 × max{0,CF(E1)}=0.6 × max{0,0.5}=0.3(2) 再由r2求CF(E4)CF(E4)=0.7 × max{0, min{CF(E2 ), CF(E3 )}}=0.7 × max{0, min{0.3, 0.6}}=0.21(3) 再由r3求CF1(H)CF1(H)= 0.8 × max{0,CF(E4)}=0.8 × max{0, 0.21)}=0.168(4) 再由r4求CF2(H)CF2(H)= 0.9 ×max{0,CF(E5)}=0.9 ×max{0, 0.7)}=0.63(5) 最后对CF1(H )和CF2(H)进行合成，求出CF(H)CF(H)= CF1(H)+CF2(H)+ CF1(H) × CF2(H)=0.6926.10 设有如下推理规则r1: IF E1THEN (2, 0.00001) H1r2: IF E2THEN (100, 0.0001) H1r3: IF E3THEN (200, 0.001) H2r4: IF H1THEN (50, 0.1) H2且已知P(E1)= P(E2)= P(H3)=0.6, P(H1)=0.091, P(H2)=0.01, 又由用户告知：P(E1| S1)=0.84, P(E2|S2)=0.68, P(E3|S3)=0.36请用主观Bayes方法求P(H2|S1, S2, S3)=?解：(1) 由r1计算O(H1| S1)先把H1的先验概率更新为在E1下的后验概率P(H1| E1)P(H1| E1)=(LS1× P(H1)) / ((LS1-1) × P(H1)+1)=(2 × 0.091) / ((2 -1) × 0.091 +1)=0.16682由于P(E1|S1)=0.84 > P(E1)，使用P(H | S)公式的后半部分，得到在当前观察S1下的后验概率P(H1| S1)和后验几率O(H1| S1)P(H1| S1) = P(H1) + ((P(H1| E1) – P(H1)) / (1 - P(E1))) × (P(E1| S1) – P(E1))= 0.091 + (0.16682 –0.091) / (1 – 0.6)) × (0.84 – 0.6)=0.091 + 0.18955 × 0.24 = 0.136492O(H1| S1) = P(H1| S1) / (1 - P(H1| S1))= 0.15807(2) 由r2计算O(H1| S2)先把H1的先验概率更新为在E2下的后验概率P(H1| E2)P(H1| E2)=(LS2×P(H1)) / ((LS2-1) × P(H1)+1)=(100 × 0.091) / ((100 -1) × 0.091 +1)=0.90918由于P(E2|S2)=0.68 > P(E2)，使用P(H | S)公式的后半部分，得到在当前观察S2下的后验概率P(H1| S2)和后验几率O(H1| S2)P(H1| S2) = P(H1) + ((P(H1| E2) – P(H1)) / (1 - P(E2))) × (P(E2| S2) – P(E2))= 0.091 + (0.90918 –0.091) / (1 – 0.6)) × (0.68 – 0.6)=0.25464O(H1| S2) = P(H1| S2) / (1 - P(H1| S2))=0.34163(3) 计算O(H1| S1,S2)和P(H1| S1,S2)先将H1的先验概率转换为先验几率O(H1) = P(H1) / (1 - P(H1)) = 0.091/(1-0.091)=0.10011再根据合成公式计算H1的后验几率O(H1| S1,S2)= (O(H1| S1) / O(H1)) × (O(H1| S2) / O(H1)) × O(H1)= (0.15807 / 0.10011) × (0.34163) / 0.10011) × 0.10011= 0.53942再将该后验几率转换为后验概率P(H1| S1,S2) = O(H1| S1,S2) / (1+ O(H1| S1,S2))= 0.35040(4) 由r3计算O(H2| S3)先把H2的先验概率更新为在E3下的后验概率P(H2| E3)P(H2| E3)=(LS3× P(H2)) / ((LS3-1) × P(H2)+1)=(200 × 0.01) / ((200 -1) × 0.01 +1)=0.09569由于P(E3|S3)=0.36 < P(E3)，使用P(H | S)公式的前半部分，得到在当前观察S3下的后验概率P(H2| S3)和后验几率O(H2| S3)P(H2| S3) = P(H2 | ¬ E3) + (P(H2) – P(H2| ¬E3)) / P(E3)) × P(E3| S3)由当E3肯定不存在时有P(H2 | ¬ E3) = LN3× P(H2) / ((LN3-1) × P(H2) +1)= 0.001 × 0.01 / ((0.001 - 1) × 0.01 + 1)= 0.00001因此有P(H2| S3) = P(H2 | ¬ E3) + (P(H2) – P(H2| ¬E3)) / P(E3)) × P(E3| S3)=0.00001+((0.01-0.00001) / 0.6) × 0.36=0.00600O(H2| S3) = P(H2| S3) / (1 - P(H2| S3))=0.00604(5) 由r4计算O(H2| H1)先把H2的先验概率更新为在H1下的后验概率P(H2| H1)P(H2| H1)=(LS4× P(H2)) / ((LS4-1) × P(H2)+1)=(50 × 0.01) / ((50 -1) × 0.01 +1)=0.33557由于P(H1| S1,S2)=0.35040 > P(H1)，使用P(H | S)公式的后半部分，得到在当前观察S1,S2下H2的后验概率P(H2| S1,S2)和后验几率O(H2| S1,S2)P(H2| S1,S2) = P(H2) + ((P(H2| H1) – P(H2)) / (1 - P(H1))) × (P(H1| S1,S2) – P(H1))= 0.01 + (0.33557 –0.01) / (1 – 0.091)) × (0.35040 – 0.091)=0.10291O(H2| S1,S2) = P(H2| S1, S2) / (1 - P(H2| S1, S2))=0.10291/ (1 - 0.10291) = 0.11472(6) 计算O(H2| S1,S2,S3)和P(H2| S1,S2,S3)先将H2的先验概率转换为先验几率O(H2) = P(H2) / (1 - P(H2) )= 0.01 / (1-0.01)=0.01010再根据合成公式计算H1的后验几率O(H2| S1,S2,S3)= (O(H2| S1,S2) / O(H2)) × (O(H2| S3) / O(H2)) ×O(H2)= (0.11472 / 0.01010) × (0.00604) / 0.01010) × 0.01010=0.06832再将该后验几率转换为后验概率P(H2| S1,S2,S3) = O(H1| S1,S2,S3) / (1+ O(H1| S1,S2,S3))= 0.06832 / (1+ 0.06832) = 0.06395可见，H2原来的概率是0.01，经过上述推理后得到的后验概率是0.06395，它相当于先验概率的6倍多。

哈工大机器学习_丁宇新_第六章答案

Machine Learning Chapter614S051053-汪洋6.1Ans :We can get that when two of patients do laboratory tests are positive, cancer and ⌝cancer posterior probability can be expressed as: P(canner|+，+)，P(⌝cancer|+，+)。

That last is a two test because it is independent of each other, so:P(+,+|cancer)=P(+|cancer)P(+|cancer)Also we can get that:P(+|cancer) P(+|cancer) P(cancer)=0.98*0.98*0.008=0.0076832P(+|⌝cancer) P(+|⌝cancer) P(⌝cancer)=0.03*0.03*0.992=0.0008928P(+,+) = P(+,+|cancer) P(cancer) + P(+,+|⌝cancer)P(⌝cancer)=0.0076832+0.0008928=0.008576So ：P(canner|+，+)＝0.0076832/0.008576=0.895896P(⌝cancer|+，+)=0.1041046.2Ans : From Bayesian formula, )()()|()|(++=+P cancer P cancer P cancer P Because the event cancer and ⌝cancer mutex, and P(cancer)+P(⌝cancer)=1,A full probability formula is available: P(+) = P(+|cancer) P(cancer) + P(+|⌝cancer)P(⌝cancer) So: )P(|P( P(cancer) cancer)|P( )()|()()()|()|(cancer cancer cancer P cancer P P cancer P cancer P cancer P ⌝⌝++++=++=+ So the normalization method is right.6.3Ans :(a) P(h): If we assume that H1 is more general than H2,assign P （h1）>=P(h2)),()()|()|(),()()|,(),|(++++=++++=++P cancer P cancer P cancer P P cancer P cancer P cancer P ),()()|()|(),()()|,(),|(++⌝⌝+⌝+=++⌝⌝++=++⌝P cancer P cancer P cancer P P cancer P cancer P cancer P ⎩⎨⎧=∀=otherwise x h d d h D P i i i )(,01)|((b) P(h): If we assume that H1 is more general than H2,assign P （h1）<=P(h2)(c) P(h) : Arbitrary assumptions hi and hj, P(hi)=P(hj)=||1H6.6Ans :In the naive Bayesian classification, at a given target value V, the attributes are independent of each other, the Bayesian network shown below, the direction of the arrow is from top to bottom. Because the attribute wind is independent of other attributes, there is no attribute associatedwith it. ⎩⎨⎧=∀=otherwise x h d d h D P i i i )(,01)|(⎩⎨⎧=∀=otherwise x h d d h D P i i i )(,01)|(。

机器学习作业参考答案

Strong
第Байду номын сангаас4 页（共 6 页）
2010-2011-1 学期机器学习作业参考答案
Wate
Forecast
Warm
Same
Yes
Yes
G2: ?
表示对所有例子都接受为正例
在把候选消除算法应用到决策树假设空间时，预计会遇到如下四种困难：（1）在把候选消除算法应用到决策树假设空间时，如果目标函数不在假设空间时，侯选消除算法得到的变型空间是空的，或者当遇到含有噪声的数据时，候选消除算法也可能出现空集合，而如果用 ID3 建立决策树则不会出现这种情况。（2）如果一个属性的值比较多，一棵决策树将会很宽，（3）如何精化 S 中的树而不比 G 中的树更加一般化是一个困难，反之，如何精化 G 中的树而不比 S 中的树更加特殊化也是一个困难，另外，要由 S 和 G 求出中间的合理决策树是十分困难的。其原因都是因为不同形状的决策树可以等价。如果在修改时不进行标准化，那么在构造时就会出现麻烦。（4）可能要建立候选决策树的重复信息很多，在选择一棵好的决策树时，计算量会很大， 4.9 不能。存在隐藏单元权值能产生书中建议的隐藏单元编码（0.1,0.2,…,0.8），例如设定输入端到隐含层的权值分别为 0.1,0.2,0.3,…,0.8。但是不存在输出单元权值能正确解码这样的输入编码。设隐含层到第 i 个输出节点的权重为 i 。以第 1 个输出节点为例，激活函数为 sigmoid 函数。当输入为 00000001 时，隐含层输出为 0.1,要使 output 接近 1， 1 必大于零；当输入不是 00000001 时，要使 output=sigmoid（ net1 ）接近 0， net1 要小于零，从而 1 必小于零。 1 的取值有矛盾。所以不能通过仅有的一个隐含节点学习恒等函数。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

P(cancer | ,) P(, | cancer ) P(cancer ) P( | cancer ) P( | cancer ) P(cancer ) P(,) P(,)
P(+|cancer) P(+|cancer) P(cancer)=0.98*0.98*0.008=0.0076832 P(+|cancer) P(+|cancer) P(cancer)=0.03*0.03*0.992=0.0008928 P(+,+) = P(+,+|cancer) P(cancer) + P(+,+|cancer)P(cancer) =0.0076832+0.0008928=0.008576 So： P(canner|+，+)＝0.0076832/0.008576=0.895896 P(cancer|+，+)=0.104104 6.2 Ans: From Bayesian formula, P(cancer | ) P( | cancer ) P(cancer )
P( ) P( | cancer ) P(cancer ) P( | cancer) P(cancer) P( | cancer P(cancer )
So the normalization method is right. 6.3 Ans: (a) P(h): If we assume that H1 is more general than H2,assign P（h1）>=P(h2)
(c) P(h) : Arbitrary assumptions hi and hj, P(hi)=P(hj)=
1 d i , d i h( xi ) P |H |
6.6 Ans: In the naive Bayesian classification, at a given target value V, the attributes are independent of each other, the Bayesian network shown below, the direction of the arrow is from top to bottom. Because the attribute wind is independent of other attributes, there is no attribute associated with it.
P(cancer | ,) P(, | cancer ) P(cancer ) P( | cancer ) P( | cancer ) P(cancer ) P(,) P(,)
That last is a two test because it is independent of each other, so: P(+,+|cancer)=P(+|cancer)P(+|cancer) Also we can get that:
P( )
Because the event cancer and cancer mutex, and P(cancer)+P(cancer)=1, A full probability formula is available: P(+) = P(+|cancer) P(cancer) + P(+|cancer)P(cancer) So: P(cancer | ) P( | cancer ) P(cancer )
1 d i , d i h( xi ) P ( D | h) otherwise 0
(b) P(h): If we assume that H1 is more general than H2,assign P（h1）<=P(h2)
1 d i , d i h( xi ) P ( D | h) otherwise 0
Machine Learning Chapter6
14S051053-汪洋
6.1 Ans: We can get that when two of patients do laboratory tests are positive, cancer and cancer posterior probability can be expressed as: P(canner|+，+)，P(cancer|+，+)。