斯坦福大学公开课：机器学习学习课程

合集下载

斯坦福2017机器学习人工智能教程课程讲义 bolcskei-stats385-slides

Task: Separate two categories of data through a linear classiﬁer
Φ(f ) =
f 1
1
: w, f > 0 : w, f < 0
not possible!
: w, Φ(f ) > 0 : w, Φ(f ) < 0
possible with w =
Why non-linear feature extractors?
Task: Separate two categories of data through a linear classiﬁer
1
: w, f > 0 : w, f < 0 not possible!
Why non-linear feature extractors?
ImageNet
ImageNet
ski rock
plant coﬀee
ImageNet
ski rock
plant coﬀee
CNNs win the ImageNet 2015 challenge [He et al., 2015 ]
Describing the content of an image
Translation invariance
Handwritten digits from the MNIST database [LeCun & Cortes, 1998 ] Feature vector should be invariant to spatial location ⇒ translation invariance
1 −1
Why non-linear feature extractors?

cs229斯坦福大学机器学习教程 Supplemental notes 4 - hoeffding

CS229 Supplemental Lecture notes Hoeﬀding’s inequality
John Duchi
1
Basic probability bounds
A basic question in probability, statistics, and machine learning is the following: given a random variable Z with expectation E[Z ], how likely is Z to be close to its expectation? And more precisely, how close is it likely to be? With that in mind, these notes give a few tools for computing bounds of the form P(Z ≥ E[Z ] + t) and P(Z ≤ E[Z ] − t) (1)
for t ≥ 0. Our ﬁrst bound is perhaps the most basic of all probability inequalities, and it is known as Markov’s inequality. Given its basic-ness, it is perhaps unsurprising that its proof is essentially only one line. Proposition 1 (Markov’s inequality). Let Z ≥ 0 be a non-negative random variable. Then for all t ≥ 0, P( Z ≥ t ) ≤ E [Z ] . t

斯坦福大学人工智能所有课程介绍

List of related AI Classes CS229covered a broad swath of topics in machine learning,compressed into a sin-gle quarter.Machine learning is a hugely inter-disciplinary topic,and there are many other sub-communities of AI working on related topics,or working on applying machine learning to diﬀerent problems.Stanford has one of the best and broadest sets of AI courses of pretty much any university.It oﬀers a wide range of classes,covering most of the scope of AI issues.Here are some some classes in which you can learn more about topics related to CS229:AI Overview•CS221(Aut):Artiﬁcial Intelligence:Principles and Techniques.Broad overview of AI and applications,including robotics,vision,NLP,search,Bayesian networks, and learning.Taught by Professor Andrew Ng.Robotics•CS223A(Win):Robotics from the perspective of building the robot and controlling it;focus on manipulation.Taught by Professor Oussama Khatib(who builds the big robots in the Robotics Lab).•CS225A(Spr):A lab course from the same perspective,taught by Professor Khatib.•CS225B(Aut):A lab course where you get to play around with making mobile robots navigate in the real world.Taught by Dr.Kurt Konolige(SRI).•CS277(Spr):Experimental Haptics.Teaches haptics programming and touch feedback in virtual reality.Taught by Professor Ken Salisbury,who works on robot design,haptic devices/teleoperation,robotic surgery,and more.•CS326A(Latombe):Motion planning.An algorithmic robot motion planning course,by Professor Jean-Claude Latombe,who(literally)wrote the book on the topic.Knowledge Representation&Reasoning•CS222(Win):Logical knowledge representation and reasoning.Taught by Profes-sor Yoav Shoham and Professor Johan van Benthem.•CS227(Spr):Algorithmic methods such as search,CSP,planning.Taught by Dr.Yorke-Smith(SRI).Probabilistic Methods•CS228(Win):Probabilistic models in AI.Bayesian networks,hidden Markov mod-els,and planning under uncertainty.Taught by Professor Daphne Koller,who works on computational biology,Bayes nets,learning,computational game theory, and more.1Perception&Understanding•CS223B(Win):Introduction to computer vision.Algorithms for processing and interpreting image or camera information.Taught by Professor Sebastian Thrun, who led the DARPA Grand Challenge/DARPA Urban Challenge teams,or Pro-fessor Jana Kosecka,who works on vision and robotics.•CS224S(Win):Speech recognition and synthesis.Algorithms for large vocabu-lary continuous speech recognition,text-to-speech,conversational dialogue agents.Taught by Professor Dan Jurafsky,who co-authored one of the two most-used textbooks on NLP.•CS224N(Spr):Natural language processing,including parsing,part of speech tagging,information extraction from text,and more.Taught by Professor Chris Manning,who co-authored the other of the two most-used textbooks on NLP.•CS224U(Win):Natural language understanding,including computational seman-tics and pragmatics,with application to question answering,summarization,and inference.Taught by Professors Dan Jurafsky and Chris Manning.Multi-agent systems•CS224M(Win):Multi-agent systems,including game theoretic foundations,de-signing systems that induce agents to coordinate,and multi-agent learning.Taught by Professor Yoav Shoham,who works on economic models of multi-agent interac-tions.•CS227B(Spr):General game playing.Reasoning and learning methods for playing any of a broad class of games.Taught by Professor Michael Genesereth,who works on computational logic,enterprise management and e-commerce.Convex Optimization•EE364A(Win):Convex Optimization.Convexity,duality,convex programs,inte-rior point methods,algorithms.Taught by Professor Stephen Boyd,who works on optimization and its application to engineering problems.AI Project courses•CS294B/CS294W(Win):STAIR(STanford AI Robot)project.Project course with no lectures.By drawing from machine learning and all other areas of AI, we’ll work on the challenge problem of building a general-purpose robot that can carry out home and oﬃce chores,such as tidying up a room,fetching items,and preparing meals.Taught by Professor Andrew Ng.2。

cs231n课程大纲

cs231n课程大纲CS231n是斯坦福大学开设的一门计算机视觉课程，以下是该课程的详细课程大纲：Lecture 1：计算机视觉的概述、历史背景以及课程计划。

Lecture 2：图像分类——包括数据驱动方法，K近邻方法和线性分类方法。

Lecture 3：损失函数和优化，分为三部分内容：1. 继续上一讲的内容介绍了线性分类方法；2. 介绍了高阶表征及图像的特点；3. 优化及随机梯度下降。

Lecture 4：神经网络，包括经典的反向传播算法、多层感知机结构以及神经元视角。

Lecture 5：卷积神经网络，分为三部分内容：1. 卷积神经网络的历史背景及发展；2. 卷积与池化；3. ConvNets的效果。

Lecture 6：如何训练神经网络I，介绍了各类激活函数，数据预处理，权重初始化，分批归一化以及超参优化。

Lecture 7：如何训练神经网络II，介绍了优化方法，模型集成，正则化，数据扩张和迁移学习。

Lecture 8：深度学习软件基础，包括详细对比了CPU和GPU，TensorFlow、Theano、PyTorch、Torch、Caffe实例的具体说明，以及各类框架的对比及用途分析。

Lecture 9：卷积神经网络架构，该课程从LeNet-5开始到AlexNet、VGG、GoogLeNet、ResNet等由理论到实例详细描述了卷积神经网络的架构与原理。

Lecture 10：循环神经网络，该课程先详细介绍了RNN、LSTM和GRU的架构与原理，再从语言建模、图像描述、视觉问答系统等对这些模型进行进一步的描述。

Lecture 11：检测与分割，在图像分类的基础上介绍了其他的计算机视觉任务，如语义分割、目标检测和实例分割等，同时还详细介绍了其它如R-CNN、Fast R-CNN、Mask R-CNN等架构。

Lecture 12：可视化和理解，讲述了特征可视化和转置，同时还描述了对抗性样本和像DeepDream 那样。

人工智能国外经典课程

人工智能国外经典课程人工智能是当今科技领域的热门话题，国外有许多经典课程涵盖了人工智能的各个领域和技术。

下面我将列举一些国外经典的人工智能课程，这些课程涵盖了人工智能的基础理论、算法和应用等方面。

1. Stanford University - CS229: Machine Learning这门课程由斯坦福大学的吴恩达教授主讲，是机器学习领域的经典之作。

课程内容包括监督学习、无监督学习、强化学习等各种机器学习算法和方法。

2. Massachusetts Institute of Technology - 6.034: Artificial Intelligence这门课程由麻省理工学院的Patrick Henry Winston教授主讲，涵盖了人工智能的基础知识、推理和规划、感知和学习等方面。

课程通过讲解经典的人工智能方法和案例，帮助学生理解人工智能的核心概念和技术。

3. University of California, Berkeley - CS188: Introduction to Artificial Intelligence这门课程是加州大学伯克利分校的经典人工智能课程，内容包括搜索、规划、机器学习、自然语言处理等方面。

课程通过理论讲解和实践项目，培养学生的人工智能编程能力和解决实际问题的能力。

4. Carnegie Mellon University - 10-701: Introduction to这门课程由卡内基梅隆大学的Tom Mitchell教授主讲，介绍了机器学习的基本理论和算法。

课程内容包括统计学习理论、监督学习和无监督学习方法等，旨在帮助学生理解机器学习的原理和应用。

5. University of Washington - CSE 446: Machine Learning这门课程由华盛顿大学的Pedro Domingos教授主讲，涵盖了机器学习的基本概念、算法和应用。

斯坦福大学公开课教案设计

一、课程背景随着互联网的快速发展，自然语言处理（NLP）技术在各个领域的应用越来越广泛。

本课程以斯坦福大学公开课《自然语言处理》为基础，旨在帮助学生了解NLP的基本概念、技术和应用，培养学生在实际项目中运用NLP技术的能力。

二、课程目标1. 了解NLP的基本概念和领域；2. 掌握NLP的主要技术和方法；3. 熟悉NLP在实际项目中的应用；4. 提高学生解决实际问题的能力。

三、课程内容1. 第一讲：NLP概述- NLP的定义和发展历程；- NLP的主要任务和挑战；- NLP在各个领域的应用。

2. 第二讲：文本预处理- 文本清洗；- 分词；- 词性标注。

3. 第三讲：词向量表示- 词向量技术；- 常见的词向量模型；- 词向量在NLP中的应用。

4. 第四讲：词性标注和命名实体识别- 词性标注技术；- 命名实体识别技术；- 常见的命名实体识别模型。

5. 第五讲：句法分析- 句法分析技术；- 常见的句法分析模型；- 句法分析在NLP中的应用。

6. 第六讲：语义分析- 语义分析技术；- 常见的语义分析模型；- 语义分析在NLP中的应用。

7. 第七讲：情感分析- 情感分析技术；- 常见的情感分析模型；- 情感分析在NLP中的应用。

8. 第八讲：机器翻译- 机器翻译技术；- 常见的机器翻译模型；- 机器翻译在NLP中的应用。

9. 第九讲：NLP在实际项目中的应用- NLP在搜索引擎中的应用；- NLP在智能客服中的应用；- NLP在推荐系统中的应用。

10. 第十讲：课程总结与展望- 课程回顾；- NLP技术的发展趋势；- 学生讨论和答疑。

四、教学方法1. 讲授法：教师讲解NLP的基本概念、技术和应用；2. 案例分析法：通过分析实际案例，帮助学生理解NLP技术的应用；3. 实践操作：布置课后练习，让学生动手实践NLP技术；4. 讨论法：组织课堂讨论，提高学生的思考能力。

五、教学评价1. 课堂参与度：关注学生在课堂上的表现，包括提问、回答问题等；2. 课后作业：检查学生对NLP技术的掌握程度；3. 期末考试：综合评价学生对NLP知识的掌握和应用能力。

学习人工智能及机器学习的最佳培训课程推荐

学习人工智能及机器学习的最佳培训课程推荐人工智能（Artificial Intelligence，简称AI）和机器学习（Machine Learning）被广泛认为是科技领域未来最有潜力的发展方向之一。

随着技术的进步和应用的推广，越来越多的人对学习人工智能和机器学习产生了浓厚的兴趣。

然而，在海量的培训课程中选择适合自己的最佳学习课程是一个困扰许多人的问题。

本文将为您推荐几个在人工智能和机器学习领域受认可的最佳培训课程。

1. Coursera - 机器学习（斯坦福大学）Coursera是一个知名的在线学习平台，为学习者提供了众多高质量的课程。

斯坦福大学的机器学习课程是Coursera上备受好评和推荐的课程之一。

该课程由著名计算机科学家Andrew Ng教授主讲，涵盖了机器学习的基础知识、常用算法和应用实践等内容。

课程以清晰易懂的教学风格和丰富的实践项目，帮助学习者逐步理解并应用机器学习的方法和技巧。

2. Udacity - 人工智能工程师纳米学位Udacity是另一个知名的在线学习平台，其人工智能工程师纳米学位课程（AI Engineer Nanodegree）是该平台推出的一门全面而深入的人工智能培训课程。

该课程涵盖了人工智能和机器学习的多个方向，包括深度学习、计算机视觉、自然语言处理等。

学习者将通过项目实战来应用所学知识，并与导师和其他学员进行互动交流，加深对人工智能领域的理解和实践经验。

3. MIT - 人工智能：基础与前沿麻省理工学院（MIT）也提供了许多优质的人工智能课程。

其中，人工智能：基础与前沿（Artificial Intelligence: Foundations and Frontiers）课程针对学习者提供了深入探索人工智能的机会。

该课程从人工智能的历史发展开始，介绍了人工智能的基本概念、核心算法和关键技术等。

学习者将了解到人工智能在不同领域的应用案例，以及当前人工智能领域的前沿研究和发展趋势。

斯坦福大学 CS229 机器学习notes12

CS229Lecture notesAndrew NgPart XIIIReinforcement Learning and ControlWe now begin our study of reinforcement learning and adaptive control.In supervised learning,we saw algorithms that tried to make their outputs mimic the labels y given in the training set.In that setting,the labels gave an unambiguous“right answer”for each of the inputs x.In contrast,for many sequential decision making and control problems,it is very diﬃcult to provide this type of explicit supervision to a learning algorithm.For example, if we have just built a four-legged robot and are trying to program it to walk, then initially we have no idea what the“correct”actions to take are to make it walk,and so do not know how to provide explicit supervision for a learning algorithm to try to mimic.In the reinforcement learning framework,we will instead provide our al-gorithms only a reward function,which indicates to the learning agent when it is doing well,and when it is doing poorly.In the four-legged walking ex-ample,the reward function might give the robot positive rewards for moving forwards,and negative rewards for either moving backwards or falling over. It will then be the learning algorithm’s job toﬁgure out how to choose actions over time so as to obtain large rewards.Reinforcement learning has been successful in applications as diverse as autonomous helicopterﬂight,robot legged locomotion,cell-phone network routing,marketing strategy selection,factory control,and eﬃcient web-page indexing.Our study of reinforcement learning will begin with a deﬁnition of the Markov decision processes(MDP),which provides the formalism in which RL problems are usually posed.12 1Markov decision processesA Markov decision process is a tuple(S,A,{P sa},γ,R),where:•S is a set of states.(For example,in autonomous helicopterﬂight,S might be the set of all possible positions and orientations of the heli-copter.)•A is a set of actions.(For example,the set of all possible directions in which you can push the helicopter’s control sticks.)•P sa are the state transition probabilities.For each state s∈S and action a∈A,P sa is a distribution over the state space.We’ll say more about this later,but brieﬂy,P sa gives the distribution over what states we will transition to if we take action a in state s.•γ∈[0,1)is called the discount factor.•R:S×A→R is the reward function.(Rewards are sometimes also written as a function of a state S only,in which case we would have R:S→R).The dynamics of an MDP proceeds as follows:We start in some state s0, and get to choose some action a0∈A to take in the MDP.As a result of our choice,the state of the MDP randomly transitions to some successor states1,drawn according to s1∼P s0a0.Then,we get to pick another action a1.As a result of this action,the state transitions again,now to some s2∼P s1a1.We then pick a2,and so on....Pictorially,we can represent this process as follows:s0a0−→s1a1−→s2a2−→s3a3−→...Upon visiting the sequence of states s0,s1,...with actions a0,a1,...,our total payoﬀis given byR(s0,a0)+γR(s1,a1)+γ2R(s2,a2)+···.Or,when we are writing rewards as a function of the states only,this becomesR(s0)+γR(s1)+γ2R(s2)+···.For most of our development,we will use the simpler state-rewards R(s), though the generalization to state-action rewards R(s,a)oﬀers no special diﬃculties.3 Our goal in reinforcement learning is to choose actions over time so as to maximize the expected value of the total payoﬀ:E R(s0)+γR(s1)+γ2R(s2)+···Note that the reward at timestep t is discounted by a factor ofγt.Thus,to make this expectation large,we would like to accrue positive rewards as soon as possible(and postpone negative rewards as long as possible).In economic applications where R(·)is the amount of money made,γalso has a natural interpretation in terms of the interest rate(where a dollar today is worth more than a dollar tomorrow).A policy is any functionπ:S→A mapping from the states to the actions.We say that we are executing some policyπif,whenever we are in state s,we take action a=π(s).We also deﬁne the value function for a policyπaccording toVπ(s)=E R(s0)+γR(s1)+γ2R(s2)+··· s0=s,π].Vπ(s)is simply the expected sum of discounted rewards upon starting in state s,and taking actions according toπ.1Given aﬁxed policyπ,its value function Vπsatisﬁes the Bellman equa-tions:Vπ(s)=R(s)+γ s ∈S P sπ(s)(s )Vπ(s ).This says that the expected sum of discounted rewards Vπ(s)for starting in s consists of two terms:First,the immediate reward R(s)that we get rightaway simply for starting in state s,and second,the expected sum of future discounted rewards.Examining the second term in more detail,we[Vπ(s )].This see that the summation term above can be rewritten E s ∼Psπ(s)is the expected sum of discounted rewards for starting in state s ,where s is distributed according P sπ(s),which is the distribution over where we will end up after taking theﬁrst actionπ(s)in the MDP from state s.Thus,the second term above gives the expected sum of discounted rewards obtained after theﬁrst step in the MDP.Bellman’s equations can be used to eﬃciently solve for Vπ.Speciﬁcally, in aﬁnite-state MDP(|S|<∞),we can write down one such equation for Vπ(s)for every state s.This gives us a set of|S|linear equations in|S| variables(the unknown Vπ(s)’s,one for each state),which can be eﬃciently solved for the Vπ(s)’s.1This notation in which we condition onπisn’t technically correct becauseπisn’t a random variable,but this is quite standard in the literature.4We also deﬁne the optimal value function according toV ∗(s )=max πV π(s ).(1)In other words,this is the best possible expected sum of discounted rewards that can be attained using any policy.There is also a version of Bellman’s equations for the optimal value function:V ∗(s )=R (s )+max a ∈A γ s ∈SP sa (s )V ∗(s ).(2)The ﬁrst term above is the immediate reward as before.The second term is the maximum over all actions a of the expected future sum of discounted rewards we’ll get upon after action a .You should make sure you understand this equation and see why it makes sense.We also deﬁne a policy π∗:S →A as follows:π∗(s )=arg max a ∈A s ∈SP sa (s )V ∗(s ).(3)Note that π∗(s )gives the action a that attains the maximum in the “max”in Equation (2).It is a fact that for every state s and every policy π,we haveV ∗(s )=V π∗(s )≥V π(s ).The ﬁrst equality says that the V π∗,the value function for π∗,is equal to the optimal value function V ∗for every state s .Further,the inequality above says that π∗’s value is at least a large as the value of any other other policy.In other words,π∗as deﬁned in Equation (3)is the optimal policy.Note that π∗has the interesting property that it is the optimal policy for all states s .Speciﬁcally,it is not the case that if we were starting in some state s then there’d be some optimal policy for that state,and if we were starting in some other state s then there’d be some other policy that’s optimal policy for s .Speciﬁcally,the same policy π∗attains the maximum in Equation (1)for all states s .This means that we can use the same policy π∗no matter what the initial state of our MDP is.2Value iteration and policy iterationWe now describe two eﬃcient algorithms for solving ﬁnite-state MDPs.For now,we will consider only MDPs with ﬁnite state and action spaces (|S |<∞,|A |<∞).The ﬁrst algorithm,value iteration ,is as follows:51.For each state s,initialize V(s):=0.2.Repeat until convergence{For every state,update V(s):=R(s)+max a∈Aγ s P sa(s )V(s ).}This algorithm can be thought of as repeatedly trying to update the esti-mated value function using Bellman Equations(2).There are two possible ways of performing the updates in the inner loop of the algorithm.In theﬁrst,we canﬁrst compute the new values for V(s)for every state s,and then overwrite all the old values with the new values.This is called a synchronous update.In this case,the algorithm can be viewed as implementing a“Bellman backup operator”that takes a current estimate of the value function,and maps it to a new estimate.(See homework problem for details.)Alternatively,we can also perform asynchronous updates. Here,we would loop over the states(in some order),updating the values one at a time.Under either synchronous or asynchronous updates,it can be shown that value iteration will cause V to converge to V∗.Having found V∗,we can then use Equation(3)toﬁnd the optimal policy.Apart from value iteration,there is a second standard algorithm forﬁnd-ing an optimal policy for an MDP.The policy iteration algorithm proceeds as follows:1.Initializeπrandomly.2.Repeat until convergence{(a)Let V:=Vπ.(b)For each state s,letπ(s):=arg max a∈A s P sa(s )V(s ).}Thus,the inner-loop repeatedly computes the value function for the current policy,and then updates the policy using the current value function.(The policyπfound in step(b)is also called the policy that is greedy with re-spect to V.)Note that step(a)can be done via solving Bellman’s equations as described earlier,which in the case of aﬁxed policy,is just a set of|S| linear equations in|S|variables.After at most aﬁnite number of iterations of this algorithm,V will con-verge to V∗,andπwill converge toπ∗.6Both value iteration and policy iteration are standard algorithms for solv-ing MDPs,and there isn’t currently universal agreement over which algo-rithm is better.For small MDPs,policy iteration is often very fast and converges with very few iterations.However,for MDPs with large state spaces,solving for Vπexplicitly would involve solving a large system of lin-ear equations,and could be diﬃcult.In these problems,value iteration may be preferred.For this reason,in practice value iteration seems to be used more often than policy iteration.3Learning a model for an MDPSo far,we have discussed MDPs and algorithms for MDPs assuming that the state transition probabilities and rewards are known.In many realistic prob-lems,we are not given state transition probabilities and rewards explicitly, but must instead estimate them from data.(Usually,S,A andγare known.) For example,suppose that,for the inverted pendulum problem(see prob-lem set4),we had a number of trials in the MDP,that proceeded as follows:s(1)0a (1) 0−→s(1)1a (1) 1−→s(1)2a (1) 2−→s(1)3a (1) 3−→...s(2)0a (2) 0−→s(2)1a (2) 1−→s(2)2a (2) 2−→s(2)3a (2) 3−→......Here,s(j)i is the state we were at time i of trial j,and a(j)i is the cor-responding action that was taken from that state.In practice,each of the trials above might be run until the MDP terminates(such as if the pole falls over in the inverted pendulum problem),or it might be run for some large butﬁnite number of timesteps.Given this“experience”in the MDP consisting of a number of trials, we can then easily derive the maximum likelihood estimates for the state transition probabilities:P sa(s )=#times took we action a in state s and got to s#times we took action a in state s(4)Or,if the ratio above is“0/0”—corresponding to the case of never having taken action a in state s before—the we might simply estimate P sa(s )to be 1/|S|.(I.e.,estimate P sa to be the uniform distribution over all states.) Note that,if we gain more experience(observe more trials)in the MDP, there is an eﬃcient way to update our estimated state transition probabilities7 using the new experience.Speciﬁcally,if we keep around the counts for both the numerator and denominator terms of(4),then as we observe more trials, we can simply keep accumulating those puting the ratio of these counts then given our estimate of P sa.Using a similar procedure,if R is unknown,we can also pick our estimate of the expected immediate reward R(s)in state s to be the average reward observed in state s.Having learned a model for the MDP,we can then use either value it-eration or policy iteration to solve the MDP using the estimated transition probabilities and rewards.For example,putting together model learning and value iteration,here is one possible algorithm for learning in an MDP with unknown state transition probabilities:1.Initializeπrandomly.2.Repeat{(a)Executeπin the MDP for some number of trials.(b)Using the accumulated experience in the MDP,update our esti-mates for P sa(and R,if applicable).(c)Apply value iteration with the estimated state transition probabil-ities and rewards to get a new estimated value function V.(d)Updateπto be the greedy policy with respect to V.}We note that,for this particular algorithm,there is one simple optimiza-tion that can make it run much more quickly.Speciﬁcally,in the inner loop of the algorithm where we apply value iteration,if instead of initializing value iteration with V=0,we initialize it with the solution found during the pre-vious iteration of our algorithm,then that will provide value iteration with a much better initial starting point and make it converge more quickly.。

[斯坦福大学2014机器学习教程笔记]第二章-模型描述

[斯坦福⼤学2014机器学习教程笔记]第⼆章-模型描述我们第⼀个学习算法是线性回归算法，在这节中，你会看到这个算法的概况，更重要的是你会了解整个监督学习过程。

下⾯我们来举⼀个例⼦，我们要根据不同房屋尺⼨所售出的价格，画出我的数据集。

⽐⽅说，如果你朋友的房⼦是1250平⽅尺⼤⼩，你要告诉他这房⼦能卖多少钱。

那么，你可以做的⼀件事就是构建⼀个模型，也许是条直线，从这个数据模型上来看，也许你可以告诉你的朋友，他能以⼤约220000(美元)左右的价格卖掉这个房⼦。

这显然是⼀个监督学习算法的例⼦。

因为每⼀个例⼦都有⼀个“正确的答案”。

也就是说，我们知道了数据集中卖出的房⼦实际的⼤⼩以及价格。

⽽且，它还是⼀个回归问题的例⼦。

回归指的是我们预测⼀个具体的数值输出（在这个例⼦中也就是价格）。

注意：另⼀种最常见的监督学习问题被称为分类问题。

我们⽤它来预测离散值的输出。

⽐如我们观察肿瘤并试图判断它是良性还是恶性，这是只有0和1的离散输出。

更正式⼀点说，在监督学习⾥，我们有⼀个数据集，它被称为⼀个训练集。

以住房价格为例，我们有⼀个房价的训练集，我们的⼯作是从这个数据中学习如何预测房价。

让我们来定义⼀些课程中会⽤到的符号。

⽤⼩写m表⽰训练样本的数量。

在这个例⼦中，如果我们有47⾏，就意味着我们有47个训练样本，则m=47。

⽤⼩写x表⽰输⼊变量或者说特征。

在这个例⼦中，x为房⼦的⾯积。

⽤⼩写y表⽰输出变量，也就是我要预测的⽬标变量。

在这个例⼦中，y为住房价格。

⽤(x,y)表⽰⼀个训练样本。

表格中的⼀⾏表⽰⼀个训练样本。

⽤(x(i),y(i))表⽰第i个训练样本（⼀个特定的样本）。

注意这个i并不是幂指数，这只是训练集的⼀个索引，指的是这个表格中的第i⾏。

在这个例⼦中，x(1)=2104,y(1)=460。

接下来我们看⼀下监督学习算法是怎么样⼯作的。

⾸先，我们向学习算法提供训练集。

⽐如我们的房价训练集。

学习算法的任务是输出⼀个函数，通常是⽤⼩写h表⽰。

[斯坦福大学网络视频课程之机器人学]IntroductionToRobotics-Lecture

IntroductionToRobotics-Lecture14Instructor (Oussama Khatib):Okay. Okay. Let’s get started. So today’s video segment is about tactile sensing. Now, I wonder what is difficult about building tactile sensors; anyone has an idea? So what is the problem with building a tactile sensor? Oh, you used to see the video first, okay. So, yeah.Student:Do you need functions to be able to, I mean, do you need a perturbation to be able to see what you’re touching sometimes?Instructor (Oussama Khatib):Well, yeah, sometimes you, I mean, a human – tactile sensing is amazing. So you have the static information, so if you grab something, now the whole surface is in contact, and you can determine the shape, right? So what does it mean in term of, like, designing a tactile sensor, just if you think about the static case?Student:It’s soft, malleable.Instructor (Oussama Khatib):Well, you need some softness in the thing you are putting. Then you need to take this whole information, what kind of resolution do you need, if you are touching to feel the edge? You need a lot of pixels, right? So how can you take this information and – first of all, how you determine that information; what kind of procedure do you – yes?Student:Well, there’s an element of pressure, like, how hard you’re – the average – how are you touching on all these different things.Instructor (Oussama Khatib):Okay. So you can imagine, maybe, a sort of resistive or capacitive sensor that will deflect a little bit and give you that information. How many of those you would need? You need, sort of, an array, right? So how large, like, let’s say this is the end of factor. I’m trying to see if you did that problem – you’re going to have a lot of information here, and you need to take it back, and you have a lot of wires; you have a matrix, and you’re going to have a lot of, basically, information to transmit. So, the design of tactile sensors being this problem of how we can put enough sensors, and how we can extract this information and take it back. So these guys came up with an interesting idea; here it is. The light, please. [Video]:A novel tactile sensor using optical phenomenon was developed. In the tactile sensors shown here, light is injected at the edge of an optical wave guide made of transparent material and covered by an elastic rubber cover. There is clearance between the cover and the wave guide. The injected light maintains total internal reflection at the surface of the wave guide and is enclosed within it. When an object makes contact with it, the rubber cover depresses and touches the wave guide. Scattered light arises at the point of contact due to the change of the reflection condition. Such tactile information can be converted into a visual image.Using this principle, a prototype finger-shaped tactile sensor with a hemispherical surface was developed. A CCD camera is installed inside the wave guide to detect scattered light arising at the contact location on the sensor’s surface. The image from the CCD camera is sent to the computer, and the location of the scattered light is determined by the image processing software. Using this information, the object’s point of contact on the sensor’s surface can be calculated.To improve the size and the operational speed of the sensor, a miniaturized version was developed. The hemispherical wave guide with cover, the light source infrared LED’s, a position-sensitive detector for converting the location of the optical input into an electric signal, and the amplifier circuit were integrated in the sensor body.The scattered light arising at the point of contact is transmitted to the detector through a bundle of optical fibers. By processing the detector’s electric signal by computer, it is possible to determine the contact location on the sensor’s surface in 1.5 milliseconds. Through further miniaturization, a fingertip diameter of 20 millimeters has been achieved in the latest version of the tactile sensor. It is currently planned to install this tactile sensor in a robotic hand with the aim of improving its dexterity.Instructor (Oussama Khatib):Okay. A cool idea, right? Because now you’re taking this information, and taking it into a visual image, and transmitting the image, and, in fact, this was done a long time ago. I believe the emperor of Japan was visiting that laboratory, and he saw this, and he was quite impressed.Before starting the lecture, just wanted to remind you that we are going to have two review sessions on Tuesday and Wednesday next week, and we will, again, sign up for two groups. I hope we will have a balance between those who are coming on Tuesday and Wednesday. We will do the signing up next Monday, so those who are not here today, be sure to come on Monday to sign up, all right?Okay. Last lecture we discussed the controlled structure. We were talking still about one degree of freedom, and we are going to pursue that discussion with one degree of freedom. So we are looking at the dynamic model of a mass moving at some acceleration, X double dot, and controlled by a force, F. So the control of this robot is done through this proportional derivative controller involving minus KP, X desired and minus KV, X dot. So the KP is your position gain, and the KV is your velocity gain.Now, if we take this blue controller and move it to the left, the closed loop behavior is going to be written as this second order equation, and in this equation, we can see that we have, sort of, mass, string, damper system whose rest position is at the desired XD position. So KV is your velocity gain, and KP is the position gain.Now, if we rewrite this equation by dividing it by M, we are going to be able to see what closed loop frequency we have and what damping ratio we have, and every time, the lecture time, this finishes. So what is your closed loop frequency? KP is equal to 10, and the mass is equal to 1; what is the closed loop frequency?Student:Square root of 10.Instructor (Oussama Khatib):Square root of 10, and what is the damping ratio? A little bit more complicated, but we can rewrite this same equation in this form, 2 zeta omega and omega square where omega is your closed loop frequency, and where zeta is this coefficient, KV divided by 2 square root of KM, and omega is simply the closed loop frequency square root of KP divided by M.So you remember this, but now the difference with before, before we had natural frequency, so we were talking about natural frequency and natural damping ratio. Now, this is your gain, and you are closing the loops, so this is your control gain; it’s the closed loop damping ratio and the closed loop frequency, okay? So the only difference is instead of a natural system with spring and damper, now we are artificially creating a frequency through this closed loop, or we are creating this damping ratio through KV.So, basically, this is what you are going to try to do, you are going to take your robot; you are going to find those gains, KP and KV, and try to control the robot with those gains. So, again, thinking about KP and KV, KV is affecting zeta, right? And KP is also affecting your omega. Now, when you are going to control your robot, what is the objective; what are you going to try to do? Let’s think about it. You’re trying to go somewhere, right, or you are trying to track a trajectory. So what do you want to achieve with those, I mean, here is your behavior; what would be good to achieve here? Yes. Student:It could see in critical damping.Instructor (Oussama Khatib):So we want to have a critically damped system most of the time, so we will reach those goal positions as quickly as possible without oscillation. So KV would be selected to achieve that value, and for that critically damped system, what is the value of zeta; anyone remembers? It was only two days ago. Zeta is equal to – for critically damped systems, zeta is equal to unity, 1. When zeta is equal to 1, that is when KV is equal to 2 square root of KPM, you have critically damped system.So, basically, if you know your KP, if you already selected your KP, and if you want critically damped system, then immediately you can compute KV from M and KP, right, for that value, for zeta. So, basically, you are trying to set zeta. What about omega? So now, we need to set KP in order to compute zeta, and how do we set omega? Someone? No idea? So you have your robot, you go and you want to control, let’s say, Joint 3. We can do it if you want. Where’s my glasses? Here’s the simulator. Oh, that doesn’t have an F factor. Let’s take this one.So, here are your gains, and right now, if we ask the robot to – so, the robot is floating, and if we ask the robot to go its zero position, it’s going to just move, and it’s moving with a KP equal 400 and KV equal 40. These are the gain we set for the robot, but, in fact, this is controlled also with dynamics. So we will get to this a little later, but if we want to see the control without dynamics, we take this, probably, non-dynamic joint control, so this one.So let’s float it a little bit. Actually, I can exert a little force outside and see if it can move; it’s really solid. Well, okay, won’t move it too much. So let’s reduce the gain here. So this springiness KP is 40. So see, now if I apply a force that is a deflection, right? And when I’m going to release, it’s going to go there, oscillate a little bit, tiny bit, not too much. In fact, this has a lot of friction, natural friction. If we remove the friction and do the same thing, it will probably oscillate more – hm, not enough. Okay. Wow, still there is friction – nope. So let’s put a little bit, minus how much? Minus two, this is -20; I think it will go unstable. Wow. So we see that your gain cannot be negative. It will – can you stop? Okay. We need some friction, otherwise it will not stop.So, in fact, you can see there is a lot of coupling. I moved just one joint, and everything else is moving. Let’s make this gain bigger. This is Joint 1, so if I pull on Joint 2, and the release – look at Joint 3; what is happening? So there is an inertial coupling coming from Joint 2 on Joint 3. Just by moving Joint 2, you are affecting Joint 3. You can see, again, Joint 2, release, and Joint 3 is moving. So in order to avoid that disturbance coming from the dynamic, what should we do with KP? Make it smaller or bigger? You’re not sure. Should we try it?So let’s make it bigger; how big? 400? Okay, 400. Now we realize with 400, this is not damped enough because we need to compute this to make it a little bigger, so let’s make it 20. Okay. So now, what do you expect; the disturbance will increase or will be reduced when I am going to release? More disturbance or less? Heath, less?Student:Less.Instructor (Oussama Khatib):Who agrees with less? Okay, and who disagrees with less? Everyone else, okay. So this is less? Yeah, it is less, actually. You’re removing little faster, and you are still oscillating, and oscillation is because we don’t have enough damping here. So if we increase the damping, it will oscillate less, and if we increase the gain – do you see what is happening now? It’s going very quickly to its position.So, in fact, the coupling – this is the degree – you look at the 90 degree between Joint Link 2 and Link 3. It is maintained, almost. In fact, if I increase Joint 2 as well, it will be hard to move it. So what is happening now with the response; do you see the response when we went to 1600? Faster or slower? Hm? Slower?Student:No.Instructor (Oussama Khatib):Faster. So the dynamic response of the closed loop is faster with higher gain. Well then, should we increase it, like, keep increasing? I don’t know. We can try.Student:But there’s a limit at some point.Instructor (Oussama Khatib):So what is the limit? So let’s make it 3,000. Now, Joint 3 is locked; it’s not moving anymore. Should we make it more? Okay. So what’s going tohappen? It’s not moving anymore. Now, the problem – if this was a real robot, would 30,000 work? Why?Student:Your motor’s gonna saturate at some level in –Instructor (Oussama Khatib):Well, suppose you have big motors. Yeah, saturation of the motors is one thing, but suppose you have really big motors; it’s not a limitation. Student:Wouldn’t you have some sort of air drift?Instructor (Oussama Khatib):Well, we’ll discuss it a little later, but, essentially, what is going to happen is that – remember, inside the structure you have motors, you have transmissions, you have gears, and all of these are going to move, and they have flexibility in the structure. This flexibility makes it that you start to excite those mode of the flexible system, and as you start moving, the motors start to vibrate, and if you have flexibility in the structure, the structures start to vibrate, and when you hit those frequencies of vibration, the system will just go unstable.So our KP, this KP that we want – oh, we closed it. Just one second, let’s go back there. So this KP we have here, this KP cannot go too high. We want it as high as possible to increase what? What it does when KP is high? Disturbance reduction because errors are coming – dynamic coupling coming from other links will be rejected; it’s stiffer. However, a KP cannot go too high because KP is deciding the closed loop frequency, and this closed frequency can go as high as those end-modeled flexibilities. Actually, we cannot even come close to them; we have to stay away from them. So omega cannot be too high, which means KP has a limit, but we want to achieve the highest KP.So what is the relationship between KP, KV, and those performance? So from those two equations, we can write KP is M omega square, and KV is M to zeta omega, right? Just to rewriting these two equations and computing KV and KP. So when we are controlling a system, we are going to set what? We’re going to set, really, the dynamics of the system, which means we need to set zeta and omega. So we set zeta and omega, and we can compute our KP and KV. Most of the time, zeta is equal to one. So KV is M to omega, and so all what is left is to set omega. So for 400, omega is equal to what? In the case of the robot in this simulation, we have 400 KP. So omega is equal to? Come on. Student:[Off mic].Instructor (Oussama Khatib):Square root –Student:[Off mic].Instructor (Oussama Khatib):Divided by – well, M is equal to 1, let’s say, in that case. It’s 20. It’s 20 multiply – what is the frequency, the real frequency?Student:[Off mic].Instructor (Oussama Khatib):Omega divided by 2 Pi, right. So what is your frequency about – let’s say divide by 6, 20 divided by 6. So it’s very low, 3-4 hertz. In fact, if you’re lucky, you can go, well, to 10 hertz. I mean, this would be great. So when we go to 1600, this is really nice, 40 divided by 6.Well, in practice, you start with very low gains, and you start turning your gains up, up, up, up, and suddenly you are going to hit that, noise start to vibrate. So go down, but we will see some ways of doing this in a more precise way, but, again, what you are seeing here is KP and KV – now, if we think about two different links, one link that is heavy, and one link that is light. M equal 1 and M equal 100. Your gain KP is going to be – for the same frequency, is going to be much, much bigger for the bigger link. So that gain is scaled by the mass, and because it is scaled by the mass, we can think about the problem of setting the gains for the unit mass system.You remember we said if I’m moving Joint 2, the inertia of Joint 2 is changing, big, small. So we need to be able to somehow account for the fact – so I set my frequency; I set omega and set zeta, and now I computed KP and KV, but M doubled. So I need to update my gains, right? If I want to move with the same closed loop frequency, I need, somehow, to update my gains, and that becomes nonlinear control. So we talk about the unit mass gains. So let’s just imagine that your system, this mass was unit mass. Your gains will be simply omega square and 2 zeta omega, which is for one, this would be 2 omega. Very simple, just set omega and you get your KP and KV. Okay?But we know the system is not going to be a unit mass. So for this M mass system, what are the gains? Gains from this KP prime and KV prime. What would be KP for M, a system with mass M using KP prime? KP will be M times KP prime, and KV just linear. So you take M, and you scale your gains, okay?Well, what is the big deal about this; why I’m talking about? Well, the big deal is that M is going to change, so even for one changing mass you can make this nonlinear, and scale and track a constant frequency and constant damping ratio, but for a system with many degrees of freedom, we have a mass matrix, and we are going to use the same concept. We are going to say I look at the unit mass system, and then I scale the unit mass system with the mass matrix, and everything will work exactly in the same way, and I will be compensating for the variation of the mass. This is the nonlinear dynamic of the coupling that we’re going to introduce, and it is based on the idea that I design the unit mass system, and then I will scale the unit mass system with the mass matrix. Well, in this case, it is just a scalar, simple mass.So this is what we call the control partitioning. If I have a system with a mass M, I basically – the composite in the mass and the unit mass system. So the blue is the unit mass system, and M is the scaling of the unit mass system. So I can now design a controller for the unit mass system with KP prime and KV prime, and then the KP and KV for the original system will be just scaled by that mass.So here is my controller F. I’m going to write it as M time F prime where F prime is this quantity, a TD controller designed for unit mass. So we always denote this as primes of KV or KP. So when we say prime, we are talking about the unit mass system. The controller of the unit mass system F prime, and F is M times that F prime. That will make more sense when we go through the multi degree of freedom controller because M becomes the mass matrix, okay?So, essentially, we have our initial system that is now controlled as a unit mass system scaled by the mass itself, and the behavior of the whole system is like this – well, the dynamic behavior, the dynamic response and the damping ratio are like this, but we have to be careful about other characteristics like the student’s rejection, stiffness; they are not, and we will see that in a second. The dynamic behavior of the closed loop is like this. So you design your controller for the unit mass and basically, if you scale with that mass, then you have the behavior of the unit mass, okay? So, in this case, what is omega for the system? It is simply the square root of KP prime, okay? And now we are going to introduce one more element. We talked about it Monday, and this is just a tiny nonlinearity. Let’s add some friction.So we started with the system without any nonlinearity, and now I’m just adding a little bit of friction, nonlinear friction, like some stiction on that joint. So the equation changed completely. That is, it’s not nonlinear anymore. We cannot just treat it as a linear system, and we have to deal with a controller that is going to be nonlinear. So how can we deal with this? Come on, ideas.So you have your joint, and it has a gear with, like, some friction that is – or even it has some gravity or whatever. Yes.Student:So if you’ve got a certain type of friction, you can, like, if it’s velocity, then you can put that into the motion equation –Instructor (Oussama Khatib):Um, hm.Student:- and change your V value, your KV.Instructor (Oussama Khatib):KV, you mean.Student:Yeah, yeah.Instructor (Oussama Khatib):The KV. So if it is linear, yeah, I think you can, in fact, integrate it directly into KV, but if it is not nonlinear, like just the gravity. So what do we do – if we have the gravity, what do we do with the gravity? We model it. I know the model because I know the mass, the center of mass, all of these things. So if I can model it, I can somehow, like, anticipate what the gravity is going to be and try to compensate for it, very good. So we can compensate for the gravity.Well, if we have a nonlinear term, what we will do is we put that compensation in the controller. So now the controller, it has the linear part which was F prime alpha F prime. Alpha F prime actually is mass F prime, and now we are going to add another term, beta, which will attempt to compensate for B. You do not know B exactly. You know, sort of, a model with some estimate of B. You don’t know X exactly. You don’t know X dot exactly. You have estimate of these, what we call the X hat, X dot hat, and B hat. Now, B has a structure. If it’s the gravity, it’s going to be, I don’t know, ML cosign that angle, and you can estimate your mass, estimate your length, estimate the position and come up with an estimate of B, which would be B hat.So, in that case, you can say alpha is simply the mass, an estimate of the mass,minus/plus one gram, probably you will find it, and your B hat is going to be an estimate of B given the state, your estimate of the state, and you’ll probably have ten epsilons, little bit more of error. So we’re assuming that we are going to have some errors, but by compensating for those nonlinearities, estimating the gravity and taking it out, later estimating centrifugal coriolis forces and trying to taking them out, we should be able to bring the closed loop system closer to a system that is a unit mass system because with this compensation, if everything was perfect, we compensated perfectly B, then basically beta will take out B. For each configuration, each velocity, beta is exactly compensating for B; it takes it out, and the system is linearized, right? Well, this will never happen in reality, but we will be very close.So this is what we can write. We can say this is our system, and this is the controller. You understand this controller? This controller is a nonlinear controller, but it is attempting to render in the closed loop, your system, to become the coupled linear system. So here’s the result. If B and B hat were identical – if B hat was compensating perfectly for B, and if the estimate of the mass matrix, later this mass was identical to M, then your system will behave this way. So what you designed for F prime will be part of the closed loop of the whole system. We’re talking about 1 degree of freedom, but if we are – later we will see 20 degree of freedom, it would be the same, okay?Well, here is how we can write this system. So our system was F with the output X, X dot, the state. Basically what we are doing is we are looking at the model of the system, and we are using X and X dot to estimate B, the nonlinearities in the system, and compensate for them. So F is going to have a component, which is B hat. In addition, our input control, which is F prime, is going to be scaled by an estimate of M, the mass of the system so that there is a virtual system here that would look like a unit mass system with an input F prime and this same output, and this big box, the red box, is like a system that is linear with unit mass, and that is the purpose of this design. Later, this will be centrifugal coriolis gravity forces, and this would be what – right, the mass matrix. So, in fact, with many degrees of freedom, we will be able to do the same thing where this becomes the mass matrix, and here we will have V and G. You remember V? Centrifugal coriolis, and G, gravity, and you can add the friction as well, okay?So, essentially, we are designing a nonlinear controller to compensate for centrifugal coriolis, gravity, and to decouple the system, to decouple the masses, the inertial forces, and to achieve a unit mass system behavior.Okay. So let’s see our design for F prime. F prime is in this structure, in the decoupled controlled structure, and if you have a desired position XD, what would be F prime? Just a goal position, so our goal position, we have X desired. F prime will be minus, minus something. Who remembers? I’m sure you remember. F prime is?Student:Minus KV prime minus X dot minus KT prime times X minus XD.Instructor (Oussama Khatib):You meant minus KP prime, X minus XD. So minus KV prime X dot minus KP prime, X minus XD, and the closed loop behavior would be very nice. So we linearized the system. All right. Well, most of the time you’re not just going to a goal position. Most of the time you are tracking a trajectory, and on this trajectory you might have, like, you might have different accelerations at different point. You have different velocities, and whereas in this controller, we are just reaching through the goal position. KP prime is trying to reduce the error, and KV prime is trying to put just damping to bring the velocity to zero at the end point, but if you are tracking a trajectory, you have all of these desired things. You have desired position, function of time, desired velocity, and desired acceleration.So we need to design a controller that is more suited for this. So what F prime would be? See, now we forget about the system because we know we can decouple it, make it linear. Let’s think about the unit mass system, how you would design a unit mass system controller, and then you put it in that structure. So what is the objective if you have all these desired things? What should F prime be?Okay. So you see on the top here is F prime. I have some desired acceleration. I have my acceleration, unit mass acceleration, equal to F prime, and I know my desired acceleration; it’s X double dot desired. So if this was really a perfect system, and you are trying to track this acceleration desired, what F prime should be? I think the question is so simple that you cannot believe it. Come on, this is very simple, too simple. So my system is X double dot, and I know the desired acceleration, X double dot desired. What should F prime be? Come on.Student:Minus the cost of minus X double dot, minus X, E double dot.Instructor (Oussama Khatib):Yeah, I think you went too far. That is correct, but I’m just saying if the system was able to respond directly to F prime with no errors, nothing, and my system is X double dot, and have the desired acceleration X double dot desired. What I would do with F prime, just make F prime equal to?Student:X double dot.Instructor (Oussama Khatib):X double dot desired, right? Right? Okay. Okay, you see what we’re talking about? You have your acceleration desired, so just put X double dot equal X double dot desired, and everything should just – you apply this force, and the system should follow X double dot desired, right?Well, it won’t. It will drift because there is really no feedback. You have your acceleration, and you are saying X double dot desired, this is my acceleration desired, and as soon as you start, the system will start accumulating errors, and it will drift. So what should we do? We should do the PD part, and that’s why now we are going to add proportional control to the error, the position error. As you said, minus KP prime, X minus X desired.What about the error in velocity? Because now I have X dot desired. What would be the term that I should use to follow X dot desired? So that would be minus KV – could you finish it? Minus KV –Student:X dot minus X desired dot.Instructor (Oussama Khatib):Exactly, from the error, X minus X dot desired, and I will – so here is the controller. So this time, if I have the full trajectory, I will form errors on the position, on the velocity, and I would feed forward the acceleration. So essentially, you are telling the system follow this desired acceleration. It’s not going – there will be errors, and I’m tightening these errors. So the closed loop behavior of this is going to be controlling the error in acceleration, in velocity, and in position, if I have the full trajectory in time, and that will, basically, if I call X minus X desired the error, then I’m really controlling the error as a second order linear system, all right?Okay. So now, we have to make sure that we can do this with the whole robot, and we have to make sure that this controller could work with those gains that we are trying to achieve, and we start analyzing the system. So let’s imagine that I designed the system, the compensation, with the B hat – I’m sorry, they are not appearing as hats, but this is B hat and M hat, and I get everything over there, but then – now we are talking about the real system. So when we were running the simulation earlier we saw that a small external force will disturb the system. So there are a lot of forces coming from the errors in dynamics, errors in the gravity estimates, nonlinear forces coming from the gears and the friction that will affect this behavior, and as we start introducing disturbances in the system, we are going to see that these gains that we set are going to play a very important role in disturbance rejection.So let’s add a little bit of disturbance here. So if we add some disturbance, going to take a very simple type of disturbance like a bounded disturbance that we are adding from some, like, type of error in the gravity. Imagine that you have this link, and you have a little disturbance coming from the gravity. So what is the affect of this disturbance on the closed loop now?。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

斯坦福大学公开课：机器学习学习课程————————————————————————————————作者：————————————————————————————————日期：斯坦福大学公开课：机器学习课程-机械制造论文斯坦福大学公开课：机器学习课程随着“智能制造”概念的普及，人工智能技术的研究与应用变的越来越被人们关注。

人工智能在制造中的运用已经成为实现制造的知识化、自动化、柔性化以实现对市场的快速响应的关键。

机器学习无疑是最有希望实现这个“智能”的研究方向之一。

斯坦福大学的“StanfordEngineering Everywhere ”免费提供学校里最受欢迎的工科课程给全世界的学生和教育工作者。

得益于这个项目，我们有机会和全世界站在同一个数量级的知识起跑线上。

课程共20 集，网易公开课已经全部翻译完成。

讲师：Andrew Ng。

[ 第1 集] 机器学习的动机与应用简介：机器学习的动机与应用、Logistic 类、机器学习的定义、监督学习概观、学习理论概述、非监督学习概述和强化学习概述。

[ 第2 集] 监督学习应用简介：监督学习应用——自主推导，ALVINN 系统，线性回归，梯度下降，组梯度下降，随机梯度下降，标准方程推导。

[ 第3 集] 欠拟合与过拟合的概念简介：欠拟合与过拟合的概念，参数化及非参数化算法概念，局部加权回归，对于线性模型的概率解释，Logistic 回归，感知器。

[ 第4 集] 牛顿方法简介：介绍了牛顿方法，可以代替梯度上升算法用来计算函数的最大值；之后以高斯分布和伯努利分布为例介绍了指数分布函数族；最后以指数分布函数族为基础，引出了广义线性模型，可以通过指定概率分布直接推导出模型。

[ 第5 集] 生成学习算法简介：一类新的学习算法——生成学习算法，并详细地讲解了该算法的一个例子：高斯判别分析；之后对生成学习算法与之前的判别学习算法进行了对比；最后介绍了一个适合对文本进行分类的算法——朴素贝叶斯算法，并结合该算法讲述了一种常用的平滑技术——Laplace 平滑。

[ 第6 集] 朴素贝叶斯算法简介：先介绍了两种朴素贝叶斯算法的事件模型；之后介绍了神经网络算法；在最后介绍了两个重要的概念：函数间隔和几何间隔。

基于这两个概念提出了一个线性分类算法：最大间隔分类器算法。

该算法用于引出一个非常重要的非线性分类算法：支持向量机。

[ 第7 集] 最优间隔分类器问题简介：首先提出了原始的优化问题：最优间隔分类器问题；之后介绍了对偶问题的概念和KKT 条件；基于原始优化问题的对偶问题的分析，介绍了SVM 算法；最后对SVM 算法进行了评价，以引出下节课对核方法的介绍。

[ 第8 集] 顺序最小优化算法简介：核的概念——它在SVM 以及许多学习算法中都有重要的应用；之后介绍了l1 norm 软间隔SVM——它是一种SVM 的变化形式，可以处理非线性可分隔的数据；最后介绍了SMO 算法——一种高效的可以解决SVM 优化问题的算法。

[ 第9 集] 经验风险最小化简介：主要介绍了模型选择中的一种常见现象——偏差方差权衡。

为了解释该概念，首先介绍了两个重要的引理——联合界引理和Hoeffding 不等式；之后定义了两个重要的概念——训练误差和一般误差，并提出了一种简化的机器学习算法模型——经验风险最小化（ERM）；最后基于这些概念对ERM 结果的理论上界进行了证明，并基于上界分析对偏差方差权衡进行了解释。

[ 第10 集] 特征选择简介：VC 维的概念——该概念能够将关于ERM 一般误差的界的结论推广到无限假设类的情形；模型选择问题——具体介绍了交叉验证方法以及几种变形；特征选择问题——具体介绍了两类方法：封装特征选择和过滤特征选择。

[ 第11 集] 贝叶斯统计正则化简介：贝叶斯统计和规范化；简单介绍了在线学习的概念；机器学习算法设计中的问题诊断技巧；两种分析技巧：误差分析与销蚀分析；两种应用机器学习算法的方式与适用场景。

[ 第12 集] K-means 算法简介：无监督学习的内容。

首先介绍了k-means 聚类算法；混合高斯模型，它是最大期望算法（EM）的一种特例；引入了Jesen 不等式，利用Jesen 不等式引出了EM算法的一般形式。

[ 第13 集] 高斯混合模型简介：对混合高斯模型在EM 算法下的结论进行了推导，并且介绍了EM 算法在混合贝叶斯模型中的应用。

最后介绍了因子分析算法。

该算法可以进行高维数据下样本数目较少的情况下的模型拟合。

[ 第14 集] 主成分分析法简介：本讲继续上一讲的内容，详细地讲解了因子分析问题对应的EM 算法的步骤推导过程，并重点提出了其中应该注意的问题。

之后介绍了主成分分析（PCA）的算法原理和主要应用。

该算法是一种常用的降低数据维度的算法。

[ 第15 集] 奇异值分解简介：主成分分析PCA，及举出利用PCA 找出相似文档的例子；SVD（奇异值分析）；无监督算法和因子分析；ICA（独立成分分析算法）和CDF（累积分布函数），并复习了高斯分布的知识；最后举了几个应用ICA 的例子。

[ 第16 集] 马尔可夫决策过程简介：主要介绍了监督学习；然后引出强化学习的知识，用“使直升机飞翔”的例子阐述强化学习；介绍了马氏决策过程（MDP），由此引出来的两个解决最优策略和最优回报的算法；最后重点介绍了“值迭代”和“策略迭代算法”的实施，以及比较了它们的优缺点。

[ 第17 集] 离散与维数灾难简介：继续马氏决策过程（MDP），以及解决状态MDP 的算法，主要详细介绍了拟合值迭代算法（fitted value iteration）和近似政策迭代（approximate policy iteration）这两种算法，并通过具体的例子和求解的方式来说明这两种算法。

[ 第18 集] 线性二次型调节控制简介：控制NVP 算法，非线性动力学系统；在动力系统的模型和线性二次型调节控制（linear quadratic regulationcontrol），导出一些处理情况的函数；还包含线性模型的建立，非线性模型的线性化的知识。

[ 第19 集] 微分动态规划简介：强化学习算法，引入调试强化学习算法，介绍Kalman 滤波器微分动态规划, 卡尔曼滤波与LQR 控制结合的一种算法（LQG 控制算法，线性二次高斯），并比较了高斯分布和卡尔曼滤波的效率问题。

[ 第20 集] 策略搜索简介：学习和复习了强化学习算法，讲述了一些POMDPs( 部分可观察马氏决策过程) 的知识，完全可观察MDP 的知识；接下来介绍了策略搜索算法（其中包括两种算法：Reinforced 和Pegasus）；最后，介绍了与这门课程相关的一些课程，并给学生提出一些希望。

名词解释机器学习(Machine Learning, ML) 是一门多领域交叉学科，涉及概率论、统计学、逼近论、凸分析和算法复杂度理论等多门学科。

专门研究计算机怎样模拟或实现人类的学习行为，以获取新的知识或技能，重新组织已有的知识结构使之不断改善自身的性能。

机器学习是人工智能的核心，是使计算机具有智能的根本途径，其应用遍及人工智能的各个领域，主要使用归纳、综合而不是演绎。

一、研究意义顾名思义，机器学习是研究如何使用机器来模拟人类学习活动的一门学科。

更为严格的提法是：机器学习是一门研究机器获取新知识和新技能，并识别现有知识的学问。

这里所说的“机器”，指的就是计算机，电子计算机，中子计算机、光子计算机或神经计算机等。

机器学习有下面几种定义：“机器学习是一门人工智能的科学，该领域的主要研究对象是人工智能，特别是如何在经验学习中改善具体算法的性能”。

“机器学习是对能通过经验自动改进的计算机算法的研究”。

“机器学习是用数据或以往的经验，以此优化计算机程序的性能标准。

”一种经常引用的英文定义是：A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.学习是人类具有的一种重要智能行为，但究竟什么是学习，长期以来却众说纷纭。

社会学家、逻辑学家和心理学家都各有其不同的看法。

机器能否像人类一样能具有学习能力呢？1959 年美国的塞缪尔(Samuel) 设计了一个下棋程序，这个程序具有学习能力，它可以在不断的对弈中改善自己的棋艺。

4 年后，这个程序战胜了设计者本人。

又过了3 年，这个程序战胜了美国一个保持8 年之久的常胜不败的冠军。

这个程序向人们展示了机器学习的能力，提出了许多令人深思的社会问题与哲学问题。

机器的能力是否能超过人的，很多持否定意见的人的一个主要论据是：机器是人造的，其性能和动作完全是由设计者规定的，因此无论如何其能力也不会超过设计者本人。

这种意见对不具备学习能力的机器来说的确是对的，可是对具备学习能力的机器就值得考虑了，因为这种机器的能力在应用中不断地提高，过一段时间之后，设计者本人也不知它的能力到了何种水平。

二、主要策略学习是一项复杂的智能活动，学习过程与推理过程是紧密相连的。

按照学习中使用推理的多少，机器学习所采用的策略大体上可分为4 种——机械学习、通过传授学习、类比学习和通过事例学习。

学习中所用的推理越多，系统的能力越强。

三、基本结构环境向系统的学习部分提供某些信息；学习部分利用这些信息修改知识库，以增进系统执行部分完成任务的效能；执行部分根据知识库完成任务，同时把获得的信息反馈给学习部分。

在具体的应用中，环境、知识库和执行部分决定了具体的工作内容，学习部分所需要解决的问题完全由上述三部分确定。