华南理工大学《模式识别》大作业报告

  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

华南理工大学《模式识别》大作业报告

题目:模式识别导论实验

学院计算机科学与工程

专业计算机科学与技术(全英创新班)

学生姓名黄炜杰

学生学号201230590051

指导教师吴斯

课程编号145143

课程学分2分

起始日期2015年5月18日

实验概述

【实验目的及要求】

Purpose:

Develop classifiers,which take input features and predict the labels.

Requirement:

•Include explanations about why you choose the specific approaches.

•If your classifier includes any parameter that can be adjusted,please report the effectiveness of the parameter on the final classification result.

•In evaluating the results of your classifiers,please compute the precision and recall values of your classifier.

•Partition the dataset into2folds and conduct a cross-validation procedure in measuring the performance.

•Make sure to use figures and tables to summarize your results and clarify your presentation.

【实验环境】

Operating system:window8(64bit)

IDE:Matlab R2012b

Programming language:Matlab

实验内容

【实验方案设计】

Main steps for the project is:

1.To make it more challenging,I select the larger dataset,Pedestrian,rather than the smaller one.But it may be not wise to learning on such a large dataset,so I normalize the dataset from0to1first and perform a k-means sampling to select the most representative samples.After that feature selection is done so as to decrease the amount of features.At last,a PCA dimension reduction is used to decrease the size of the dataset.

2.Six learning algorithms including K-Nearest Neighbor,perception,decision tree, support vector machine,multi-layer perception and Naïve Bayesian are used to learn the pattern of the dataset.

3.Six learning algorithm are combing into six multi-classifiers system individually, using bagging algorithm.

实验过程:

The input dataset is normalized to the range of[0,1]so that make it suitable for performing k-means clustering on it,and also increase the speed of learning algorithms.

There are too much sample in the dataset,only a small part of them are enough to learn a good classifier.To select the most representative samples,k-means clustering is used to cluster the sample into c group and select r%of them.There are14596 samples initially,but1460may be enough,so r=10.The selection of c should follow three criterions:

a)Less drop of accuracy

b)Little change about ratio of two classes

c)Smaller c,lower time complexity

So I design two experiments to find the best parameter c:

Experiment1:

Find out the training accuracy of different amount of cluster.The result is shown in the figure on the left.X-axis is amount of cluster and Y-axis is accuracy.Red line denotes accuracy before sampling and blue line denotes accuracy after sampling.As it’s shown in the figure,c=2,5,7,9,13may be good choice since they have relative higher accuracy.

相关文档
最新文档