CMU高级机器学习非参数贝叶斯模型

相关主题
  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

Advanced Machine Learning Nonparametric Bayesian Models --

Learning/Reasoning in Open Possible Worlds --Learning/Reasoning Eric Xing Lecture 17, August 14, 2009 Reading: Eric Xing © Eric Xing @ CMU, 2006-2009 1 Clustering Eric Xing © Eric Xing @ CMU, 2006-2009 2 1

Image Segmentation How to segment images? Manual segmentation (very expensive Algorithm segmentation K-means Statistical mixture models Spectral clustering Problems with most existing algorithms Ignore the spatial information Perform the segmentation one image at a time Need to specify the number of segments a priori Eric Xing © Eric Xing @ CMU, 2006-2009 3 Object Recognition and Tracking (1.9, 9.0, 2.1 (1.8, 7.4, 2.3 (1.9, 6.1, 2.2 (0.7, 5.1, 3.2 (0.9, 5.8, 3.1 (0.6, 5.9, 3.2 t=1 Eric Xing t=2 © Eric Xing @ CMU, 2006-2009 t=3 4 2

Modeling The Mind … Latent brain processes: View picture Read sentence Decide whether consistent fMRI scan: ∑ … … Eric Xing … t=1 © Eric Xing @ CMU, 2006-2009 t=T 5 The Evolution of Science Research circles Phy Research topics Bio CS PNAS papers 1900 Eric Xing © Eric Xing @ CMU, 2006-2009 2000 6 ? 3

A Classical Approach Clustering as Mixture Modeling Then "model selection" Eric Xing © Eric Xing @ CMU, 2006-2009 7 Partially Observed, Open and Evolving Possible Worlds Unbounded # of objects/trajectories Changing attributes Birth/death, merge/split Relational ambiguity The parametric paradigm: p φk0 ({ } or p({φ } 1:T k Event model p φkt +1 motion model t k ({ } {φ } Ξ*+1|t+1 t Finite Entity space Structurally unambiguous Ξ*|t t Sensor model p(x | {φk } observation space How to open it up? © Eric Xing @ CMU, 2006-2009 8 Eric Xing 4

Model Selection vs. Posterior Inference Model selection "intelligent" guess: ??? cross validation: data-hungry information theoretic: AIC TIC MDL : ˆ arg min KL f (⋅ | g (⋅ | θ ML , K Parsimony, Ockam's Razor need to compute data likelihood ( Bayes factor: Posterior inference: we want to handle uncertainty of model complexity explicitly

p (M | D ∝ p (D | M p (M we favor a distribution that does not constrain M in a "closed" space! M ≡ {θ , K } Eric Xing © Eric Xing @ CMU, 2006-2009 9 Two "Recent" Developments First order probabilistic languages (FOPLs Examples: PRM, BLOG … Lift graphical models to "open" world (#rv, relation, index, lifespan … Focus on complete, consistent, and operating rules to instantiate possible worlds, and formal language of expressing such rules Operational way of defining distributions over possible worlds, via sampling methods Bayesian Nonparametrics Examples: Dirichlet processes, stick-breaking processes … From finite, to infinite mixture, to more complex constructions (hierarchies, spatial/temporal sequences, … Focus on the laws and behaviors of both the generative formalisms and resulting distributions Often offer explicit expression of distributions, and expose the structure of the distributions --motivate various approximate schemes Eric Xing © Eric Xing @ CMU, 2006-2009 10 5

Clustering How to label them ? How many clusters ??? Eric Xing © Eric Xing @ CMU, 2006-2009 11 Ra ndom Partition of Probability Space {φ4 , π 4 } {φ6 , π 6 } {φ5 , π 5 } {φ3 , π 3 } {φ1 1 . (event, pevent centroid :=φ Image ele. :=(x,θ ,π } {φ2 , π 2 } … Eric Xing © Eric Xing @ CMU, 2006-2009 12 6

Stick-breaking Process π G = ∑ k δ (θk θk ~ G0 k =1 k =1 ∞ ∞ 0 0.6 0.4 0.5 0.8 0.4 0.3 0.24 π ∑ k =1 k -1 Location 0.3 π k = βk ∏(1 - βk j =1 βk ~ Beta(1, α Eric Xing Mass © Eric Xing @ CMU, 2006-2009 G0 13 Chinese Restaurant Process θ1 θ2 P(ci = k | c -i = 1 1 1+α 1 2 +α 1 3+α m1 i + α -1 α 1+α 1 2 +α 2 3 +αm2 i + α -1 0 0 0 α 2 +α α 3

+α .... α i + α -1 CRP defines an exchangeable distribution on partitions over an (infinite sequence of samples, such a distribution is formally known as the Dirichlet Process (DP Eric Xing © Eric Xing @ CMU, 2006-2009 14 7

D irichlet Process φ4 φ6 φ3 φ5 φ φ1 2 a distribution A CDF, G, on possible worlds of random partitions follows a Dirichlet Process if for any measurable finite partition (φ1,φ2, .., φm: (G(φ1, G(φ2, …, G(φm ~ Dirichlet( αG0(φ1, …., αG0(φm another distributio n where G0 is the base measure and α is the scale parameter Thus a Dirichlet

相关文档
最新文档