weka算法参数整理

合集下载

WEKA聚类算法wine数据集分析研究报告

WEKA聚类算法wine数据集分析研究报告一、引言WEKA是一款强大的机器学习软件，它提供了多种聚类算法，包括K-Means、SimpleKMeans、BIRCH等。

这些算法可以用来解决各种不同的聚类问题。

在本文中，我们将使用WEKA的聚类算法对wine数据集进行分析和研究。

二、数据集介绍wine数据集是一个非常知名的数据集，它包含了178个样本和13个特征。

这些特征包括醇类、酸度、PH值等，可以用来预测葡萄酒的质量。

这个数据集是一个多类别的数据集，它的类别数是3。

三、WEKA聚类算法介绍WEKA的聚类算法有很多种，其中最常用的是K-Means算法。

K-Means 算法是一种迭代的算法，它将数据集划分为K个簇，每个簇的中心点是该簇所有点的平均值。

这个算法的目标是最小化所有簇内的距离之和。

四、实验过程1、数据预处理：我们对wine数据集进行预处理，包括去除缺失值、标准化数据等。

2、聚类实验：然后，我们使用WEKA的K-Means算法对wine数据集进行聚类实验。

我们设定了不同的K值，进行了多次实验，并记录了每次实验的结果。

3、结果分析：我们分析了实验结果，发现当K=3时，聚类效果最好。

此时，每个簇的样本数分别是60、61和57，非常接近于原始数据集中的类别数。

五、结论通过WEKA聚类算法对wine数据集的分析和研究，我们发现当K=3时，聚类效果最好。

这意味着wine数据集可以被分为三个类别，每个类别对应一种葡萄酒。

这个结果与实际情况相符，说明我们的聚类方法是有效的。

六、展望未来，我们可以进一步研究WEKA的其他聚类算法，如SimpleKMeans、BIRCH等，看看它们是否可以更好地解决wine数据集的聚类问题。

我们也可以研究如何通过调整WEKA的参数来优化聚类效果。

聚类分析算法研究聚类分析是一种无监督学习方法，它在许多领域都有广泛的应用，包括数据挖掘、机器学习、图像处理、生物信息学等。

在本文中，我们将探讨聚类分析的基本概念、常见的聚类算法以及未来的研究方向。

weka算法参数整理

1.关联算法1.1.Aprior算法1.1.1.Apriori算法weka参数界面概要实现Apriori关联规则挖掘算法，挖掘出给定参数条件下的关联规则。

此迭代的减少最小支持度直到发现设定最小置信度下的规则数目。

1.1.2.Apriori算法参数配置说明英文名称中文翻译默认值取值范围参数说明car分类关联分析False False返回常规的关联分析规则True返回指定分类属性的关联规则classIndex分类属性索引-1{-1,[1,N]}int-1代表最后一列，设置的数字代表相应的列作为分类属性；Car为True时生效。

delta delta0.05(0,1)每次迭代upperBoundMinSupport减少的数值，直到最小支持度或设定规则数目。

lowerBoundMinSupport最小支持度下限0.1(0,upperBoundMinSupport)迭代过程中最小支持度的下限。

metricType度量类型confidence Confidence(置信度)规则项集数目占规则前件数目比例；car为True，metricType只能用confidence。

Lift(提升度)>1P(A,B)/P(A)P(B)；规则前件和规则后件同时发生的概率除以分布单独发生的概率之积；Lift=1时表示A和B独立，数值越大前后件关联性越强。

Leverage(杠杆率)P(A,B)-P(A)P(B)；Leverage=0时A和B独立，数值越大A和B的关联性越强。

Conviction(确信度)P(A)P(!B)/P(A,!B)（!B表示B没有发生）Conviction也是用来衡量A和B的独立性。

从它和lift的关系（对B取反，代入Lift公式后求倒数）可以看出，这个值越大,A、B越关联。

minMetric最小度量值0.9根据metricType取值不同Confidence(0,1);lift>1;leverage>0;conviction(0,1)numRules规则数目10[1,+∞]int关联算法产生规则的数目outputItemSets输出项集False False不输出频繁项集True输出频繁项集removeAllMissingCols移除空列False False不移除所有值都缺失的列True移除所有值都缺失的列significanceLevel显著性水平-1？(0,1)χ2检验的显著性水平，-1则不进行检验。

数据挖掘实验报告Weka的数据聚类分析

甘肃政法学院本科生实验报告（2）姓名:学院:计算机科学学院专业:信息管理与信息系统班级:实验课程名称:数据挖掘实验日期:指导教师及职称:实验成绩:开课时间：2013—2014 学年一学期甘肃政法学院实验管理中心印制二．实验环境Win 7环境下的Eclipse三、实验内容在WEKA中实现K均值的算法,观察实验结果并进行分析。

四、实验过程与分析一、实验过程1、添加数据文件打开Weka的Explore，使用Open file点击打开本次实验所要使用的raff格式数据文件“auto93.raff”2、选择算法类型点击Cluster中的Choose，选择本次实验所要使用的算法类型“SimpleKMeans”3、得出实验结果选中“Cluster Mode”的“Use training set”，点击“Start”按钮，观察右边“Clusterer output”给出的聚类结果如下：=== Run information ===Scheme: weka.clusterers.SimpleKMeans -N 2 -S 10Relation: sInstances: 93Attributes: 23ManufacturerTypeCity_MPGHighway_MPGAir_Bags_standardDrive_train_typeNumber_of_cylindersEngine_sizeHorsepowerRPMEngine_revolutions_per_mile5528.8462 2622.3077 1 15.1346 4.7115 174.8654 100.2692 67.0385 36.8462 26.891 12.6069 2722.3077 0 16.4019Std Devs: N/A N/A 6.0746 5.7467 N/A N/A 0.7301 0.5047 40.8149 484.7019 377.1753 N/A 3.0204 0.848 11.2599 5.5735 2.4968 2.338 2.7753 2.3975 492.4971 N/A 7.9863Clustered Instances0 41 ( 44%)52 ( 56%)4、修改Seed值5、得出修改Seed值后的实验结果=== Run information ===Scheme: weka.clusterers.SimpleKMeans -N 2 -S 8Relation: sInstances: 93Attributes: 23ManufacturerTypeCity_MPGHighway_MPG二、实验分析本次实验采用的数据文件是“1993NewCarData ”。

引用 Weka学习二(聚类算法)

引用Weka学习二（聚类算法）智能信息处理2010-07-11 22:06:40 阅读18 评论0 字号：大中小订阅.引用其实的Weka学习二（聚类算法）上次我介绍了分类器的使用方法，这次我来介绍一下聚类算法。

聚类算法在数据挖掘里面被称之为无监督学习（unsupervised learning），这是与分类算法（supervised learning）相对的。

在它们两者之间还一种叫做半监督学习（semi-supervised learning）这个我会在后面的文章中重点介绍。

所谓无监督学习就是在预先不知道样本类别的情况下，由聚类算法来判别样本的类别的一种学习方法。

聚类算法的一般过程分为：1. 读入需预测样本2. 初始化聚类算法（并设置参数）3. 使用聚类算法对样本进行聚类4. 打印聚类结果我们来看下面的一个实例：package com.csdn;import java.io.File;import weka.clusterers.SimpleKMeans; import weka.core.DistanceFunction; import weka.core.EuclideanDistance; import weka.core.Instances;import weka.core.converters.ArffLoader;/** Date: 2009.4.2* by: Wang Yi* Email: wangyi19840906@ * QQ: 270135367**/public class SimpleCluster {/*** @param args*/public static void main(String[] args) {// TODO Auto-generated method stubInstances ins = null;Instances tempIns = null;SimpleKMeans KM = null;DistanceFunction disFun = null;try{/** 1.读入样本*/File file= new File("C:\\Program Files\\Weka-3-6\\data\\contact-lenses.arff");ArffLoader loader = new ArffLoader();loader.setFile(file);ins = loader.getDataSet();/** 2.初始化聚类器* 在3.6版本可以通过setDistanceFunction(DistanceFunction df)* 函数设置聚类算法内部的距离计算方式* 而在3.5版本里面默认的采用了欧几里得距离*/KM = new SimpleKMeans();//设置聚类要得到的类别数量KM.setNumClusters(2);/** 3.使用聚类算法对样本进行聚类*/KM.buildClusterer(ins);/** 4.打印聚类结果*/tempIns = KM.getClusterCentroids();System.out.println("CentroIds: " + tempIns);}catch(Exception e){e.printStackTrace();}}}我们可以看到读入样本的过程是与上一节的一样的方法。

Weka中BP神经网络的实践（参数调整以及结果分析）

Weka中BP神经⽹络的实践（参数调整以及结果分析）废话：周⽇讲了下神经⽹络，本来想的是以理论和实践相结合，前⾯讲讲神经⽹络，后⾯简单讲下在weka中怎么使⽤BP神经⽹络，可惜最后时间不够，⽽且姥姥的兴趣点跑到凸优化那⾥去了，所以没有讲成实践的部分，有点郁闷的。

为了不浪费了，就把这部分讲稿拿出来和⼤家分享⼀下，也希望对⼤家实践神经⽹络有所帮助。

因为是讲稿，讲的要⽐写的多，所以很多地⽅⼝语化和省略⽐较严重，⼤家凑合着看吧。

实践部分讲稿正⽂：Weka是什么？Weka是由新西兰怀卡托⼤学⽤Java开发的数据挖掘常⽤软件，Weka是怀卡托智能分析系统的缩写。

Weka限制在GNU通⽤公众证书的条件下发布，它⼏乎可以运⾏在所有操作系统平台上，包括Linux、Windows、Macintosh等。

Weka中BP神经⽹络的实践：Weka中的神经⽹络使⽤多层多层感知器实现BP神经⽹络。

让我们看看weka⾃带的帮助⽂件是怎么描述的：BP神经⽹络在weka中是分属这个部分的weka.classifiers.functions.MultilayerPerceptron其是⼀个使⽤了反向传播（backpropagation）的分类器。

你可以⼿⼯构造这个⽹络，⽤算法创建它，或者两者兼备。

这个⽹络可以在训练的过程中被监视和修改。

⽹络中的节点是Sigmoid的，除了当类别（class）是数值属性（numeric）的，这时输出节点变成了unthresholded linear units。

注：sigmoid如下图关于⾥⾯参数的配置如下图下⾯我们来看各个参数的具体意义：GUI弹出⼀个GUI界⾯。

其允许我们在神经⽹络训练的过程中暂停和做⼀些修改（altering）按左键添加⼀个节点（node）（节点将被⾃动选择以保证没有其他的节点被选择）选中⼀个节点：左键单击连接⼀个节点：⾸先选中⼀个起始节点，然后点击⼀个结束节点或者空⽩区域（这将创建⼀个新节点并与起始节点连接）。

weka的apriori算法的实验总结及体会

一、前言Weka是一款流行的数据挖掘工具，其内置了多种经典的数据挖掘算法。

其中，Apriori算法是一种用于发现数据集中频繁项集的经典算法。

在本次实验中，我们将对Weka中的Apriori算法进行实验，并总结经验体会。

二、实验准备1. 数据集准备：选择一个符合Apriori算法输入要求的数据集，本次实验选取了一个包含购物篮信息的数据集，用于分析不同商品之间的关联规则。

2. Weka环境准备：确保Weka软件已经安装并能够正常运行。

三、实验步骤1. 数据集加载：我们将选取的数据集导入Weka软件中，确保数据集能够正确显示。

2. 参数设置：在Weka中，Apriori算法有一些参数需要设置，如最小支持度、最小置信度等。

根据实际需求，设置适当的参数。

3. 算法执行：执行Apriori算法，观察结果。

可以得到频繁项集、关联规则等信息。

4. 结果分析：根据算法输出的结果，分析不同项集之间的关联规则，并进行对比和总结。

四、实验结果1. 频繁项集分析：通过Apriori算法的执行，得到了数据集中的频繁项集信息。

可以发现一些商品之间的频繁组合，为进一步的关联规则分析提供了基础。

2. 关联规则分析：根据频繁项集，进一步推导出了一些关联规则。

如果购买了商品A，那么购买商品B的概率较大。

这对于商家进行商品搭配和促销活动有一定的指导作用。

3. 算法性能评估：除了得到具体的关联规则外，还可以对算法的性能进行评估。

包括算法执行时间、内存占用、参数敏感性等方面的评估。

五、实验体会1. 算法优缺点：经过实验，我们发现Apriori算法在处理大规模数据集时存在一定的计算复杂度，需要进行优化才能适应大规模数据挖掘的需求。

但在小规模数据集上，其表现仍然较为理想。

2. 参数选择经验：在实验中，我们也总结出了一些参数选择的经验，如支持度和置信度的合理选择范围，以及对于不同数据集的适应性。

3. 应用前景展望：关联规则挖掘在电商、市场营销等领域有着广泛的应用前景，我们相信在未来的实际工作中，能够将所学到的知识应用到真实的业务场景中。

机器学习工具WEKA的使用总结,包括算法选择、属性选择、参数优化

一、属性选择：1、理论知识：见以下两篇文章：数据挖掘中的特征选择算法综述及基于WEKA的性能比较_陈良龙数据挖掘中约简技术与属性选择的研究_刘辉2、weka中的属性选择2.1评价策略（attribute evaluator）总的可分为filter和wrapper方法，前者注重对单个属性进行评价，后者侧重对特征子集进行评价。

Wrapper方法有：CfsSubsetEvalFilter方法有：CorrelationAttributeEval2.1.1 Wrapper方法：（1）CfsSubsetEval根据属性子集中每一个特征的预测能力以及它们之间的关联性进行评估，单个特征预测能力强且特征子集内的相关性低的子集表现好。

Evaluates the worth of a subset of attributes by considering the individual predictive ability of each feature along with the degree of redundancy between them.Subsets of features that are highly correlated with the class while having low intercorrelation are preferred.For more information see:M. A. Hall (1998). Correlation-based Feature Subset Selection for Machine Learning. Hamilton, New Zealand.（2）WrapperSubsetEvalWrapper方法中，用后续的学习算法嵌入到特征选择过程中，通过测试特征子集在此算法上的预测性能来决定其优劣，而极少关注特征子集中每个特征的预测性能。

因此，并不要求最优特征子集中的每个特征都是最优的。

WEKA数据分析实验

WEKA 数据分析实验1.实验简介借助工具Weka 3.6 ，对数据样本进行测试，分类测试方法包括：朴素贝叶斯、决策树、随机数三类，聚类测试方法包括：DBScan，K均值两种；2.数据样本以熟悉数据分类的各类常用算法，以及了解Weka的使用方法为目的，本次试验中，采用的数据样本是Weka软件自带的“Vote”样本，如图：3.关联规则分析1)操作步骤：a)点击“Explorer”按钮，弹出“Weka Explorer”控制界面b)选择“Associate”选项卡；c)点击“Choose”按钮，选择“Apriori”规则d)点击参数文本框框，在参数选项卡设置参数如：e)点击左侧“Start”按钮2)执行结果：=== Run information ===Scheme: weka.associations.Apriori -I -N 10 -T 0 -C 0.9 -D 0.05 -U 1.0 -M 0.5 -S -1.0 -c -1 Relation: voteInstances: 435Attributes: 17handicapped-infantswater-project-cost-sharingadoption-of-the-budget-resolutionphysician-fee-freezeel-salvador-aidreligious-groups-in-schoolsanti-satellite-test-banaid-to-nicaraguan-contrasmx-missileimmigrationsynfuels-corporation-cutbackeducation-spendingsuperfund-right-to-suecrimeduty-free-exportsexport-administration-act-south-africaClass=== Associator model (full training set) ===Apriori=======Minimum support: 0.5 (218 instances)Minimum metric <confidence>: 0.9Number of cycles performed: 10Generated sets of large itemsets:Size of set of large itemsets L(1): 12Large Itemsets L(1):handicapped-infants=n 236adoption-of-the-budget-resolution=y 253physician-fee-freeze=n 247religious-groups-in-schools=y 272anti-satellite-test-ban=y 239aid-to-nicaraguan-contras=y 242synfuels-corporation-cutback=n 264education-spending=n 233crime=y 248duty-free-exports=n 233export-administration-act-south-africa=y 269Class=democrat 267Size of set of large itemsets L(2): 4Large Itemsets L(2):adoption-of-the-budget-resolution=y physician-fee-freeze=n 219adoption-of-the-budget-resolution=y Class=democrat 231physician-fee-freeze=n Class=democrat 245aid-to-nicaraguan-contras=y Class=democrat 218Size of set of large itemsets L(3): 1Large Itemsets L(3):adoption-of-the-budget-resolution=y physician-fee-freeze=n Class=democrat 219Best rules found:1. adoption-of-the-budget-resolution=y physician-fee-freeze=n 219 ==> Class=democrat 219 conf:(1)2. physician-fee-freeze=n 247 ==> Class=democrat 245 conf:(0.99)3. adoption-of-the-budget-resolution=y Class=democrat 231 ==> physician-fee-freeze=n 219 conf:(0.95)4. Class=democrat 267 ==> physician-fee-freeze=n 245 conf:(0.92)5. adoption-of-the-budget-resolution=y 253 ==> Class=democrat 231 conf:(0.91)6. aid-to-nicaraguan-contras=y 242 ==> Class=democrat 218 conf:(0.9)3)结果分析：a)该样本数据，数据记录数435个，17个属性，进行了10轮测试b)最小支持度为0.5，即至少需要218个实例；c)最小置信度为0.9；d)进行了10轮搜索，频繁1项集12个，频繁2项集4个，频繁3项集1个；4.分类算法-随机树分析1)操作步骤：a)点击“Explorer”按钮，弹出“Weka Explorer”控制界面b)选择“Classify ”选项卡；c)点击“Choose”按钮，选择“trees” “RandomTree”规则d)设置Cross-validation 为10次e)点击左侧“Start”按钮2)执行结果：=== Run information ===Scheme:weka.classifiers.trees.RandomTree -K 0 -M 1.0 -S 1Relation: voteInstances:435Attributes:17handicapped-infantswater-project-cost-sharingadoption-of-the-budget-resolutionphysician-fee-freezeel-salvador-aidreligious-groups-in-schoolsanti-satellite-test-banaid-to-nicaraguan-contrasmx-missileimmigrationsynfuels-corporation-cutbackeducation-spendingsuperfund-right-to-suecrimeduty-free-exportsexport-administration-act-south-africaClassTest mode:10-fold cross-validation=== Classifier model (full training set) ===RandomTree==========el-salvador-aid = n| physician-fee-freeze = n| | duty-free-exports = n| | | anti-satellite-test-ban = n| | | | synfuels-corporation-cutback = n| | | | | crime = n : republican (0.96/0)| | | | | crime = y| | | | | | handicapped-infants = n : democrat (2.02/0.01) | | | | | | handicapped-infants = y : democrat (0.05/0)| | | | synfuels-corporation-cutback = y| | | | | handicapped-infants = n : democrat (0.79/0.01)| | | | | handicapped-infants = y : democrat (2.12/0)| | | anti-satellite-test-ban = y| | | | adoption-of-the-budget-resolution = n| | | | | handicapped-infants = n : democrat (1.26/0.01)| | | | | handicapped-infants = y : republican (1.25/0.25)| | | | adoption-of-the-budget-resolution = y| | | | | handicapped-infants = n| | | | | | crime = n : democrat (5.94/0.01)| | | | | | crime = y : democrat (5.15/0.12)| | | | | handicapped-infants = y : democrat (36.99/0.09)| | duty-free-exports = y| | | crime = n : democrat (124.23/0.29)| | | crime = y| | | | handicapped-infants = n : democrat (16.9/0.38)| | | | handicapped-infants = y : democrat (8.99/0.02)| physician-fee-freeze = y| | immigration = n| | | education-spending = n| | | | crime = n : democrat (1.09/0)| | | | crime = y : democrat (1.01/0.01)| | | education-spending = y : republican (1.06/0.02)| | immigration = y| | | synfuels-corporation-cutback = n| | | | religious-groups-in-schools = n : republican (3.02/0.01)| | | | religious-groups-in-schools = y : republican (1.54/0.04)| | | synfuels-corporation-cutback = y : republican (1.06/0.05)el-salvador-aid = y| synfuels-corporation-cutback = n| | physician-fee-freeze = n| | | handicapped-infants = n| | | | superfund-right-to-sue = n| | | | | crime = n : democrat (1.36/0)| | | | | crime = y| | | | | | mx-missile = n : republican (1.01/0)| | | | | | mx-missile = y : democrat (1.01/0.01)| | | | superfund-right-to-sue = y : democrat (4.83/0.03)| | | handicapped-infants = y : democrat (8.42/0.02)| | physician-fee-freeze = y| | | adoption-of-the-budget-resolution = n| | | | export-administration-act-south-africa = n| | | | | mx-missile = n : republican (49.03/0)| | | | | mx-missile = y : democrat (0.11/0)| | | | export-administration-act-south-africa = y| | | | | duty-free-exports = n| | | | | | mx-missile = n : republican (60.67/0)| | | | | | mx-missile = y : republican (6.21/0.15)| | | | | duty-free-exports = y| | | | | | aid-to-nicaraguan-contras = n| | | | | | | water-project-cost-sharing = n| | | | | | | | mx-missile = n : republican (3.12/0)| | | | | | | | mx-missile = y : democrat (0.01/0)| | | | | | | water-project-cost-sharing = y : democrat (1.15/0.14) | | | | | | aid-to-nicaraguan-contras = y : republican (0.16/0)| | | adoption-of-the-budget-resolution = y| | | | anti-satellite-test-ban = n| | | | | immigration = n : democrat (2.01/0.01)| | | | | immigration = y| | | | | | water-project-cost-sharing = n| | | | | | | mx-missile = n : republican (1.63/0)| | | | | | | mx-missile = y : republican (1.01/0.01)| | | | | | water-project-cost-sharing = y| | | | | | | superfund-right-to-sue = n : republican (0.45/0)| | | | | | | superfund-right-to-sue = y : republican (1.71/0.64) | | | | anti-satellite-test-ban = y| | | | | mx-missile = n : republican (7.74/0)| | | | | mx-missile = y : republican (4.05/0.03)| synfuels-corporation-cutback = y| | adoption-of-the-budget-resolution = n| | | superfund-right-to-sue = n| | | | anti-satellite-test-ban = n| | | | | physician-fee-freeze = n : democrat (1.39/0.01)| | | | | physician-fee-freeze = y| | | | | | water-project-cost-sharing = n : republican (1.01/0)| | | | | | water-project-cost-sharing = y : democrat (1.05/0.05)| | | | anti-satellite-test-ban = y : democrat (1.13/0.01)| | | superfund-right-to-sue = y| | | | education-spending = n| | | | | physician-fee-freeze = n| | | | | | crime = n : democrat (0.09/0)| | | | | | crime = y| | | | | | | handicapped-infants = n : democrat (1.01/0.01)| | | | | | | handicapped-infants = y : democrat (1/0)| | | | | physician-fee-freeze = y| | | | | | immigration = n| | | | | | | export-administration-act-south-africa = n : democrat(0.34/0.11)| | | | | | | export-administration-act-south-africa = y| | | | | | | | crime = n : democrat (0.16/0)| | | | | | | | crime = y| | | | | | | | | mx-missile = n| | | | | | | | | | handicapped-infants = n : republican (0.29/0) | | | | | | | | | | handicapped-infants = y : republican (1.88/0.87) | | | | | | | | | mx-missile = y : democrat (0.01/0)| | | | | | immigration = y : republican (1.01/0)| | | | education-spending = y| | | | | physician-fee-freeze = n| | | | | | handicapped-infants = n : democrat (1.51/0.01)| | | | | | handicapped-infants = y : democrat (2.01/0)| | | | | physician-fee-freeze = y| | | | | | crime = n : republican (1.02/0)| | | | | | crime = y| | | | | | | export-administration-act-south-africa = n| | | | | | | | handicapped-infants = n| | | | | | | | | immigration = n| | | | | | | | | | mx-missile = n| | | | | | | | | | | water-project-cost-sharing = n : democrat (1.01/0.01)| | | | | | | | | | | water-project-cost-sharing = y : republican (1.81/0)| | | | | | | | | | mx-missile = y : democrat (0.01/0)| | | | | | | | | immigration = y| | | | | | | | | | mx-missile = n : republican (2.78/0)| | | | | | | | | | mx-missile = y : democrat (0.01/0)| | | | | | | | handicapped-infants = y| | | | | | | | | mx-missile = n : republican (2/0)| | | | | | | | | mx-missile = y : democrat (0.4/0)| | | | | | | export-administration-act-south-africa = y| | | | | | | | mx-missile = n : republican (8.77/0)| | | | | | | | mx-missile = y : democrat (0.02/0)| | adoption-of-the-budget-resolution = y| | | anti-satellite-test-ban = n| | | | handicapped-infants = n| | | | | crime = n : democrat (2.52/0.01)| | | | | crime = y : democrat (7.65/0.07)| | | | handicapped-infants = y : democrat (10.83/0.02)| | | anti-satellite-test-ban = y| | | | physician-fee-freeze = n| | | | | handicapped-infants = n| | | | | | crime = n : democrat (2.42/0.01)| | | | | | crime = y : democrat (2.28/0.03)| | | | | handicapped-infants = y : democrat (4.17/0.01)| | | | physician-fee-freeze = y| | | | | mx-missile = n : republican (2.3/0)| | | | | mx-missile = y : democrat (0.01/0)Size of the tree : 143Time taken to build model: 0.01seconds=== Stratified cross-validation ====== Summary ===Correctly Classified Instances 407 93.5632 %Incorrectly Classified Instances 28 6.4368 %Kappa statistic 0.8636Mean absolute error 0.0699Root mean squared error 0.2379Relative absolute error 14.7341 %Root relative squared error 48.8605 %Total Number of Instances 435=== Detailed Accuracy By Class ===TP Rate FP Rate Precision Recall F-Measure ROC Area Class0.955 0.095 0.941 0.955 0.948 0.966 democrat0.905 0.045 0.927 0.905 0.916 0.967 republicanWeighted Avg. 0.936 0.076 0.936 0.936 0.935 0.966 === Confusion Matrix ===a b <-- classified as255 12 | a = democrat16 152 | b = republican3)结果分析：a)该样本数据，数据记录数435个，17个属性，进行了10轮交叉验证b)随机树长143c)正确分类共407个，正确率达93.5632 %d)错误分类28个，错误率6.4368 %e)测试数据的正确率较好5.分类算法-随机树分析1)操作步骤：a)点击“Explorer”按钮，弹出“Weka Explorer”控制界面b)选择“Classify ”选项卡；c)点击“Choose”按钮，选择“trees” “J48”规则d)设置Cross-validation 为10次e)点击左侧“Start”按钮2)执行结果：=== Run information ===Scheme:weka.classifiers.trees.J48 -C 0.25 -M 2Relation: voteInstances:435Attributes:17handicapped-infantswater-project-cost-sharingadoption-of-the-budget-resolutionphysician-fee-freezeel-salvador-aidreligious-groups-in-schoolsanti-satellite-test-banaid-to-nicaraguan-contrasmx-missileimmigrationsynfuels-corporation-cutbackeducation-spendingsuperfund-right-to-suecrimeduty-free-exportsexport-administration-act-south-africaClassTest mode:10-fold cross-validation=== Classifier model (full training set) ===J48 pruned tree------------------physician-fee-freeze = n: democrat (253.41/3.75)physician-fee-freeze = y| synfuels-corporation-cutback = n: republican (145.71/4.0)| synfuels-corporation-cutback = y| | mx-missile = n| | | adoption-of-the-budget-resolution = n: republican (22.61/3.32) | | | adoption-of-the-budget-resolution = y| | | | anti-satellite-test-ban = n: democrat (5.04/0.02)| | | | anti-satellite-test-ban = y: republican (2.21)| | mx-missile = y: democrat (6.03/1.03)Number of Leaves : 6Size of the tree : 11Time taken to build model: 0.06seconds=== Stratified cross-validation ====== Summary ===Correctly Classified Instances 419 96.3218 % Incorrectly Classified Instances 16 3.6782 % Kappa statistic 0.9224Mean absolute error 0.0611Root mean squared error 0.1748Relative absolute error 12.887 %Root relative squared error 35.9085 %Total Number of Instances 435=== Detailed Accuracy By Class ===TP Rate FP Rate Precision Recall F-Measure ROC Area Class0.97 0.048 0.97 0.97 0.97 0.971 democrat0.952 0.03 0.952 0.952 0.952 0.971 republicanWeighted Avg. 0.963 0.041 0.963 0.963 0.963 0.971=== Confusion Matrix ===a b <-- classified as259 8 | a = democrat8 160 | b = republican3)结果分析：a)该样本数据，数据记录数435个，17个属性，进行了10轮交叉验证b)决策树分6级，长度11c)正确分类共419个，正确率达96.3218 %d)错误分类16个，错误率3.6782 %e)测试结果接近随机数，正确率较高6.分类算法-朴素贝叶斯分析1)操作步骤：a)点击“Explorer”按钮，弹出“Weka Explorer”控制界面b)选择“Classify ”选项卡；c)点击“Choose”按钮，选择“bayes” “Naive Bayes”规则d)设置Cross-validation 为10次e)点击左侧“Start”按钮2)执行结果：=== Stratified cross-validation ====== Summary ===Correctly Classified Instances 392 90.1149 %Incorrectly Classified Instances 43 9.8851 %Kappa statistic 0.7949Mean absolute error 0.0995Root mean squared error 0.2977Relative absolute error 20.9815 %Root relative squared error 61.1406 %Total Number of Instances 435=== Detailed Accuracy By Class ===TP Rate FP Rate Precision Recall F-Measure ROC Area Class0.891 0.083 0.944 0.891 0.917 0.973democrat0.917 0.109 0.842 0.917 0.877 0.973republicanWeighted Avg. 0.901 0.093 0.905 0.901 0.902 0.973 === Confusion Matrix ===a b <-- classified as238 29 | a = democrat14 154 | b = republican3)结果分析a)该样本数据，数据记录数435个，17个属性，进行了10轮交叉验证b)正确分类共392个，正确率达90.1149 %c)错误分类43个，错误率9.8851 %d)测试正确率较高7.分类算法-RandomTree、决策树、朴素贝叶斯结果比较：RandomTree 决策树朴素贝叶斯正确率93.5632% 96.3218 % 90.1149 %混淆矩阵 a b <-- classified as255 12 | a = democrat16 152 | b = republican a b <-- classified as259 8 | a = democrat8 160 | b = republicana b <-- classified as238 29 | a = democrat14 154 | b =republican标准误差48.8605 % 35.9085 % 61.1406 % 根据以上对照结果，三类分类算法对样板数据Vote测试准确率类似；8.。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

Output per-class stats
输出每类的统计信息输出每个分类的 TP rate，FPrate，查准率/查全率以及 True/False 统计信息。
Output entropy evaluation measures ？输出熵评估量度输出中包括熵评估度量
Output confusion matrix
取出特定百分比的数据作为训练数据集训练模型，其他数据作为测试数据。
以上所述训练数据集和测试数据集均为模型验证时候的数据集，与模型的建立无关，模型建立均用实验提供的全部训练数据集。
3.2. 分类算法输出选项界面
英文名称
中文翻译
配置说明
Output model
输出模型
输出通过完整训练集得到的分类模型，以便能够浏览、可视化等。
EuclideanDistance
doNotCheckCapabilities

不检查适用范围
False
dontReplaceMissingValues
不替换缺失值
False
fastDistanceCalc
加速距离计算
False
initializationMethod
初始化质心方法
Random
maxIterations numClusters numExecutionSlots preserveInstancesOrder
最大迭代次数
100
maximumNumberOfClusters
最大的聚类数目
-1
minLogLikelihoodImprovementIter
ating
minLogLikelihoodImprovementCV
minStdDev
numClusters
簇数目
-1
numExecutionSlots
最大执行线程数目 1
True
seed
随机数种子
10
使用的随机数种子，不随机化则该值设为-1
2.3. EM 聚类算法
2.3.1. EM 聚类算法适用范围
Class
类
No class
Attributes Numeric Empty nominal Nominal Missing values Unary Binary
属性
2.3.2. EM 聚类算法参数界面
2.3.3. EM 聚类算法参数说明
英文名称 debug
中文翻译设置调试模式
默认值 False
displayModelInOldFormat
以旧格式显示结果 False
doNotCheckCapabilities
不检查适用范围
False
maxIterations
Conviction(确信度)
0.9
根据 metricType 取值不同
规则项集数目占规则前件数目比例；car 为 True，metricType 只能用 confidence。
P(A,B)/P(A)P(B)；规则前件和规则后件同时发生的概率除以分布单独发生的概率之积； Lift=1 时表示 A 和 B 独立，数值越大前后件关联性越强。
P(A,B)-P(A)P(B)；Leverage=0 时 A 和 B 独立，数值越大 A 和 B 的关联性越强。
P(A)P(!B)/P(A,!B) （!B 表示 B 没有发生） Conviction 也是用来衡量 A 和 B 的独立性。从它和 lift 的关系（对 B 取反，代入 Lift 公式后求倒数）可以看出，这个值越大, A、B 越关联。
Cross-validation
Percentage split
中文翻译使用训练集提供测试集
交叉验证
分割百分比
配置说明
使用训练集训练并直接使用训练集测试。
使用训练集训练模型，从文件中加载一组测试实例，单击 “Set...” 按钮选择测试文件，进行模型测试。
把数据分成 k 份，从第 1 份开始，作为测试数据，其他作为训练数据集，一直到第 k 份结束，验证模型的能力。
verbose
详细模式
False
False True
算法不以冗余模式运行算法以冗余模式运行
2. 聚类
2.1. weka 聚类主界面及参数说明
2.1.1. 聚类算法主界面
2.1.2. 聚类算法主界面参数说明
英文名称
中文翻译
配置说明
Use training set
使用训练集
使用训练集训练并直接使用训练集测试。
默认值取值范围
参数说明
car
分类关联分析
False
False
返回常规的关联分析规则
True
返回指定分类属性的关联规则
classIndex delta
分类属性索引
-1
delta
0.05
{-1,[1,N]} int (0,1)
-1 代表最后一列，设置的数字代表相应的列作为分类属性；Car 为 True 时生效。每次迭代 upperBoundMinSupport 减少的数值，直到最小支持度或设定规则数目。
取样的随机种子
随机抽取测试数据时产生随机数的种子
Preserve order for split
取样时保持顺序
抽取测试数据集时是否保持数据的顺序抽取，如果不选择此选项，则随机抽取。
Output source code
输出源代码
输出构建模型的 java 源代码，并能指定 java 类的名称。
3.3. 分类算法评价尺度参数界面及解释
最大迭代次数簇数目最大执行线程数目保持实例顺序
reduceNumberOfDistanceCalcsVi 减少计算距离数目 aCanopies
500 2 1 False
False
取值范围 [1,+∞)
? ? (T2,+∞) (-∞,T1) False True False True EuclideanDistance Manhattan distance False True False True False True Random k-means++ Canopy farthest first [1,+∞) [2,N) [1,?] False True False
Store cluster for visualization 为可视化保存簇选择后训练完成后，保存簇以供可视化使用
2.2. SimpleKMeans 算法
2.2.1. SimpleKMeans 算法参数配置用户界面和开发模式界面
2.2.2. SimpleKMeans 聚类算法参数配置说明
英文名称 canopyMaxNumCanopiesToHoldI nMemory canopyMinimumCanopyDensity canopyPeriodicPruningRate canopyT1 canopyT2 debug
输出混淆矩阵
输出中包括分类器对测试数据集预测得到的混淆矩阵
Store prediction for visualization
为可视化保留预测
保存分类器的预测结果，以便可视化。
Error plot point size proportional to ？ margin
Output prediction
中文翻译内存中最大 canopy 数目
最低 canopy 密度修剪周期 Canopy 聚类 T1 半径 Canopy 聚类 T2 半径设置调试模式
默认值 100
2.0 10000 -1.25 -1 False
displayStdDevs
显示标准差
False
distanceFunction
距离函数
Confidence(0,1); lift >1; leverage >0; conviction (0,1)
numRules
规则数目
10
[1,+∞] int
关联算法产生规则的数目
outputItemSets
输出项集
False
False
不输出频繁项集
True
输出频繁项集
removeAllMissingCols 移除空列
参数说明如果用 canopy 聚类方法进行初始化，这个参数就是在内存中保存的最大的候选 canopies 数目。在使用 canopy 初始化时，在修剪时的 canopy 最低密度。如果用 canopy 初始化，参数为修剪低密度 canopies 周期。 canopy 聚类时 T1 半径，当小于 0 时，T1=（-values）*T2。 canopy 聚类时 T2 半径，当值为负数时，根据属性标准差求出。调试信息不输出输出调试信息不显示数值属性的标准差，不统计标称属性每类的数目。显示数值属性的标准差，或统计标称属性没类的数目。欧氏距离马氏距离在聚类之前，检查聚类器的使用范围。在聚类之前，不检查聚类器的使用范围。在全局范围内用平均值或中数替换缺失值不替换根据 cut-off 值加速距离计算不加速距离计算随机选取质心先使用 k-means++聚类算法初始化质心先使用 Canopy 聚类算法初始化质心先使用 farthest firsty 聚类算法初始化质心迭代过程中达到最大迭代次数结束本次聚类。设定聚类个数，即最后被聚成几类。设置成可用的 cpu 数目保持实例顺序不保持实例顺序在用 canopy 聚类初始化时，减少计算距离的数目。
Supplied test set
提供测试集
使用训练集训练模型，从文件中加载一组测试实例，单击 “Set...” 按钮选择测试文件，进行模型测试。