SVM Guide

合集下载

svm训练参数

svm训练参数
在使用支持向量机（SVM）进行训练时，有一些重要的参数需要
选择。

以下是一些常见的SVM训练参数：
1. C参数（惩罚参数）：控制了决策边界的平滑度。

较小的C值将产生较大的决策边界间隔，较大的C值将产生更小的决策边界间隔。

选择C的值取决于数据集的特征。

2. 核函数：SVM可以使用不同的核函数来进行非线性分类。

常见的核函数包括线性核函数、多项式核函数和径向基函数（RBF）核函数。

选择适当的核函数取决于数据集的特性。

3. gamma参数（RBF核函数特有）：控制了决策边界的曲率。

较
高的gamma值将产生更复杂的决策边界，较低的gamma值将产生更简
单的决策边界。

4. degree参数（多项式核函数特有）：控制了多项式核函数的
阶数。

较高的阶数可以使决策边界更复杂，但可能导致过拟合。

5. coef0参数（多项式和sigmoid核函数特有）：控制了多项式核函数和sigmoid核函数的影响。

较大的coef0值将产生更复杂的决
策边界。

这些参数的选择通常需要通过交叉验证等方法来进行调整，以获
得最佳分类性能。

大数据十大经典算法SVM-讲解PPT

大数据十大经典算法svm-讲解
contents
目录
• 引言 • SVM基本原理 • SVM模型构建与优化 • SVM在大数据处理中的应用 • SVM算法实现与编程实践 • SVM算法性能评估与改进 • 总结与展望
01 引言
算法概述
SVM（Support Vector Machine，支持向量机）是一种监督学习模型，用于数据分类和回归分析。
性能评估方法
01
准确率评估
通过计算模型在测试集上的准确率来评估SVM算法的性能，准确率越
高，说明模型分类效果越好。
02
混淆矩阵评估
通过构建混淆矩阵，可以计算出精确率、召回率、F1值等指标，更全面
地评估SVM算法的性能。
03
ROC曲线和AUC值评估
通过绘制ROC曲线并计算AUC值，可以评估SVM算法在不同阈值下的
核函数是SVM的重要组成部分，可将数据映射到更高维的空间，使得原本线性不可分的数据变得线性可分。常见的核函数有线性核、多项式核、高斯核等。
SVM的性能受参数影响较大，如惩罚因子C、核函数参数等。通过交叉验证、网格搜索等方法可实现SVM参数的自动调优，提高模型性能。
SVM在文本分类、图像识别、生物信息学等领域有广泛应用。通过具体案例，可深入了解 SVM的实际应用效果。
SVM算法实现步骤
模型选择
选择合适的SVM模型，如CSVM、ν-SVM或One-class SVM等。
模型训练
使用准备好的数据集对SVM模型进行训练，得到支持向量和决策边界。
数据准备
准备用于训练的数据集，包括特征提取和标签分配。
参数设置
设置SVM模型的参数，如惩罚系数C、核函数类型及其参数等。

《支持向量机SVM》课件

多分类SVM
总结词
多类分类支持向量机可以使用不同的核函数和策略来解决多类分类问题。
详细描述
多类分类支持向量机可以使用不同的核函数和策略来解决多类分类问题。常用的核函数有线性核、多项式核和RBF核等。此外，一些集成学习技术也可以与多类分类SVM结合使用，以提高分类性能和鲁棒性。
03
SVM的训练与优化
细描述
对于非线性数据，线性不可分SVM通过引入核函数来解决分类问题。核函数可以将数据映射到更高维空间，使得数据在更高维空间中线性可分。常用的核函数有线性核、多项式核和径向基函数（RBF）。
通过调整惩罚参数C和核函数参数，可以控制模型的复杂度和过拟合程度。
详细描述
多分类支持向量机可以通过两种策略进行扩展：一对一（OAO）和一对多（OAA）。在OAO策略中，对于n个类别的多分类问题，需要构建n(n-1)/2个二分类器，每个二分类器处理两个类别的分类问题。在OAA策略中，对于n个类别的多分类问题，需要构建
n个二分类器，每个二分类器处理一个类别与剩余类别之间的分类问题。
鲁棒性高
SVM对噪声和异常值具有一定的鲁棒性，这使得它在许多实际应用中表现良好。
SVM的缺点
计算复杂度高
对于大规模数据集，SVM的训练时间可能会很长，因为其需要解决一个二次规划问题。
对参数敏感
SVM的性能对参数的选择非常敏感，例如惩罚因子和核函数参数等，需要仔细调整。
对非线性问题处理有限
SVM的优点
分类效果好
SVM在许多分类任务中表现出了优秀的性能，尤其在处理高维数据和解决非线性问题上。
对异常值不敏感
SVM在训练过程中会寻找一个最优超平面，使得该平面的两侧的类别距离最大化，这使得SVM对异常值的影响较小。

FANUC伺服调整教材

①设定1902#0#1=0
#7 1902 #6 #5 #4 #3 #2 #1 ASE #0 FMD
#1：ASE
#0：FMD
FSSB的设定方式为自动设定方式时 0：自动设定未完成。 1：自动设定已经完成。 0：FSSB的设定方式为自动方式。 1：FSSB的设定方式为手动方式。
14
BEIJING-FANUC
2020
2001 1820 2084/2085 2022 2023 2024 1821

18
BEIJING-FANUC
第一章伺服电机规格及初始化
1、初始化设定位
设定初始化设定位
#7 初始化设定位 #1：DGP 0：进行伺服参数的初始设定。 1：结束伺服参数的初始设定。 #6 #5 #4 #3 #2 #1 DGP #0

3
BEIJING-FANUC
目录
四、自动增益调整
五、加工条件选择功能第四章：SERVO GUIDE软件的使用及调试方法一、Servo Guide软件介绍二、Servo Guide连接三、 Servo Guide调整步骤第五章：伺服调整实例分析一、工件表面光洁度调整案例

16
BEIJING-FANUC
第一章伺服电机规格及初始化
四、伺服初始化
伺服初始化是在完成了FSSB连接与设定的基础上进行电机的一转移动量以及电机种类的设定。伺服电机必须经过初始化相关参数正确设定后才能够正常运行。
设定参数3111后，伺服设定画面能够显示。

第一章伺服电机规格及初始化
②按伺服电机连接顺序设定参数1023的值。 1023 伺服轴号
设定控制轴为放大器连接的第几个伺服轴，通常控制轴号与伺服轴号设定相同。

SVM支持向量机PPT

核函数的改进方向可能包括研究新的核函数形式，如高阶核函数、多核函数等，以提高SVM的分类精度和泛化能力。
增量学习与在线学习
增量学习是指模型能够随着新数据的不断加入而进行自我更新和调整的能力。在线学习则是增量学习的一种特殊形式，它允许模型在实时数据流上进行学习和更新。
随着大数据时代的到来，增量学习和在线学习在许多领域中变得越来越重要。未来的SVM研究将更加注重增量学习和在线学习方面的研究，以提高SVM在处理大规模、高维数据集时的效率和准确性。
SVM
如前所述，SVM通过找到能够将不同类别的数据点最大化分隔的决策边界来实现分类。 SVM具有较弱的表示能力和学习能力，但具有较好的泛化能力。
比较
神经网络和SVM在分类问题上有不同的优势和局限性。神经网络适合处理复杂和高度非线性问题，而SVM在处理大规模和线性可分数据集时表现更佳。选择哪种算法取决于具体问题和数据特性。
与贝叶斯分类器比较
贝叶斯分类器
贝叶斯分类器是一种基于概率的分类方法。它通过计算每个类别的概率来对新的输入数据进行分类。贝叶斯分类器具有简单和高效的特点，但需要较大的训练样本。
SVM
如前所述，SVM通过找到能够将不同类别的数据点最大化分隔的决策边界来实现分类。SVM具有较好的泛化能力和处理大规模数据集的能力，但计算复杂度较高。
svm支持向量机
contents
目录
• SVM基本概念 • SVM分类器 • SVM优化问题 • SVM应用领域 • SVM与其他机器学习算法的比较 • SVM未来发展方向
01 SVM基本概念
定义
定义
SVM（Support Vector Machine）是一种监督学习模型，用于分类和回归分析。

FANUC系统调试作业指导书

K1.2 K2.0 K2.1
润滑泵使能
10 把刀库
K1.2=0 润滑泵受系统控制 K1.2=1 润滑泵不受系统控制 K2.0=1(K0.7=K2.1=K2.2=K2.3=0) 斗笠式刀库 D0=10 K2.0=K0.7=1(K2.1=K2.2=K2.3=0)圆盘式刀库 C2=10
16 把刀库
K2.1=1(K0.7=K2.0=K2.2=K2.3=0)斗笠式刀库 D0=16 K2.1=K0.7=1(K2.0=K2.2=K2.3=0)圆盘式刀库 C2=16
地址
Y2.0 Y2.1 Y2.2 Y2.3 Y2.4 Y2.5 Y2.6 Y2.7 Y3.0 Y3.1 Y3.2 Y3.3 Y3.4 Y3.5 Y3.6 Y3.7 Y5.3
电机电压是否正常，至伺服变压器电压及伺服变压器输出电压是否正常，至变频器电压
是否正常，润滑泵工作电压是否正常。
③ 严禁带电检查任何线路，如果不能解决问题，请及时报告技术部。
在 BOOT 引导画面下，将 PMC 和 SRAM 参数装入系统。如果机床带刀库，请根据刀库
装载类型将刀库宏程序（O9001）输入系统。
书》
4. 检查工作灯、报警灯、就绪灯、排屑器、冷却泵、润滑泵、松刀按钮、刀库马达、《发
刀臂马达，要求动作正确。机床 5. 对 RS232 通讯接口进行试验，要求通讯可靠。 8 功能 6. 对机床的润滑、冷却油路进行检查，要求密封可靠，冷却充分，检查 7. 润滑良好，油路系统不得有渗漏现象。
那科参数说明书》
4020/4133
1320/1321
1420
主轴转速3744/4020
GSVM5030 258/258/258 500/300/300 6000

svm向量机的使用流程

SVM向量机的使用流程1. 简介SVM（Support Vector Machine，支持向量机）是一种常用的机器学习算法，广泛应用于分类和回归问题。

它基于统计学习理论中的结构风险最小化原则，在数据平面上构建出一个最优的超平面，用于分类不同类别的样本数据。

2. 基本原理SVM基于以下两个关键概念：2.1 最大间隔分类器SVM试图寻找一个超平面，使得不同类别的样本点能够被它们之间的最大间隔所分开。

这样的超平面称为最大间隔分类器。

2.2 支持向量支持向量是距离最大间隔分类器最近的样本点，它们对于定义超平面和分类决策起到关键的作用。

3. 使用流程SVM的使用流程主要包括以下几个步骤：3.1 数据准备首先，需要准备用于训练和测试的数据。

数据应该包含已知类别的样本点，并且每个样本点都应该由一些特征组成。

3.2 数据预处理在使用SVM之前，需要对数据进行预处理。

这包括特征选择、特征缩放和数据标准化等步骤，以确保数据的质量和一致性。

3.3 模型训练使用训练集来训练SVM模型。

在训练过程中，SVM将根据具体的核函数和超参数来寻找最优的超平面。

使用测试集对训练完成的SVM模型进行评估。

常用的评估指标包括准确率、精确率、召回率和F1值等。

3.5 模型优化根据评估结果，可以调整SVM模型的超参数，以优化模型的性能。

常见的超参数包括核函数类型、正则化系数等。

3.6 模型应用训练完成并优化的SVM模型可以用于预测新样本的类别。

通过输入新样本的特征，SVM将输出预测的类别。

4. 实例演示下面以一个简单的二分类问题为例，演示SVM的使用流程：4.1 数据准备准备一个包含两个类别的样本数据集，每个类别有若干个样本点。

每个样本点都有一些特征，比如身高、体重等。

4.2 数据预处理对数据进行特征选择，保留与分类相关性较高的特征。

然后，进行特征缩放和数据标准化，以确保不同特征的量纲一致。

4.3 模型训练使用训练集对SVM模型进行训练。

svm基本结构

支持向量机（SVM）是一种广泛使用的监督学习算法，主要用于分类任务。

SVM的基本结构可以分为以下几个核心部分：1. 数据集：SVM算法输入的是一个包含多个样本的数据集，每个样本由一组特征和一个标签组成。

2. 特征空间：SVM的第一步是将原始数据映射到一个更高维度的特征空间。

这样做通常是为了找到一个合适的分离超平面，该超平面能够最好地分隔不同的类别。

3. 支持向量：在特征空间中，最靠近分离超平面的训练样本点被称为支持向量。

这些点是决定超平面位置的关键因素。

4. 分离超平面：SVM的目标是找到一个超平面，它能够最大化两个类别之间的间隔（即支持向量之间的距离）。

5. 软间隔：在实际应用中，可能存在一些难以精确分类的样本。

为了提高模型的泛化能力，SVM允许存在一些违反分类规则的样本，即引入软间隔的概念，允许一定的误分类。

6. 最优边界：除了寻找一个合适的分离超平面之外，SVM也致力于使离群点（即那些距离超平面最近的点）尽可能远离决策边界。

7. 核函数：当数据不是线性可分的时候，SVM通过使用核技巧将数据映射到更高维的空间，使之变得线性可分。

常用的核函数包括线性核、多项式核、径向基函数（RBF）核和sigmoid 核。

8. 正则化：为了避免过拟合，SVM可以通过引入正则化项来控制模型的复杂度。

常见的正则化技术包括L1正则化和L2正则化。

9. 优化问题：SVM的目标函数可以通过拉格朗日乘子法转换成一个凸优化问题，该问题可以通过各种优化算法求解，例如序列最小优化（SMO）算法。

SVM的结构和原理使得它非常适合处理中小规模的数据集，并且在许多实际应用中取得了很好的性能。

然而，当面对非常大的数据集时，SVM可能会遇到计算效率和存储效率的问题。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

An Idiot’s guide to Support vectormachines (SVMs)R. Berwick, Village IdiotSVMs: A NewGeneration of Learning Algorithms •Pre 1980:–Almost all learning methods learned linear decision surfaces.–Linear learning methods have nice theoretical properties •1980’s–Decision trees and NNs allowed efficient learning of non-linear decision surfaces–Little theoretical basis and all suffer from local minima •1990’s–Efficient learning algorithms for non-linear functions based on computational learning theory developed–Nice theoretical properties.Key Ideas•Two independent developments within last decade–Computational learning theory–New efficient separability of non-linearfunctions that use “kernel functions”•The resulting learning algorithm is an optimization algorithm rather than a greedy search.Statistical Learning Theory•Systems can be mathematically described as a system that–Receives data (observations) as input and–Outputs a function that can be used to predictsome features of future data.•Statistical learning theory models this as a function estimation problem •Generalization Performance (accuracy in labeling test data) is measuredOrganization•Basic idea of support vector machines –Optimal hyperplane for linearly separablepatterns–Extend to patterns that are not linearlyseparable by transformations of original data tomap into new space –Kernel function •SVM algorithm for pattern recognition Unique Features of SVM’s andKernel Methods•Are explicitly based on a theoretical model of learning•Come with theoretical guarantees about their performance•Have a modular design that allows one to separately implement and design their components•Are not affected by local minima•Do not suffer from the curse of dimensionalitySupport Vectors•Support vectors are the data points that lie closest to the decision surface•They are the most difficult to classify•They have direct bearing on the optimum location of the decision surface•We can show that the optimal hyperplane stems from the function class with the lowest “capacity” (VC dimension).Recall: Which Hyperplane?•In general, lots of possiblesolutions for a,b,c.•Support Vector Machine(SVM) finds an optimalsolution. (wrt what cost?)Support Vector Machine (SVM)Support vectors Maximize margin•SVMs maximize the margin around the separating hyperplane.•The decision function is fully specified by a subset of training samples, the support vectors .•Quadratic programming problem•Text classification method du jour Separation by Hyperplanes•Assume linear separability for now:–in 2 dimensions, can separate by a line–in higher dimensions, need hyperplanes•Can find separating hyperplane by linear programming (e.g. perceptron):–separator can be expressed as ax + by = cLinear Programming / PerceptronFind a,b,c, such thata x+b y≥c for red pointsa x+b y≤c for green points.Which Hyperplane?In general, lots of possiblesolutions for a,b,c.Which Hyperplane?•Lots of possible solutions for a,b,c.•Some methods find a separating hyperplane,but not the optimal one (e.g., perceptron)•Most methods find an optimal separatinghyperplane•Which points should influence optimality?–All points•Linear regression•Naïve Bayes–Only “difficult points” close to decisionboundary•Support vector machines•Logistic regression (kind of)Support Vectors again for linearlyseparable case•Support vectors are the elements of the training set that would change the position of the dividing hyper plane if removed.•Support vectors are the critical elements of the training set•The problem of finding the optimal hyper plane is an optimization problem and can be solved by optimization techniques (use Lagrange multipliers to get into a form that can be solved analytically).X X X XXXSupport Vectors: Input vectors for which w 0T x + b 0= 1 or w 0T x + b 0= -1ρ0d +d -Definitions Define the hyperplane H such that:x i •w +b ≥+1 when y i =+1x i •w +b ≤-1 when y i =-1d+ = the shortest distance to the closest positive pointd-= the shortest distance to the closest negative pointThe margin of a separating hyperplane is d ++ d -.HH1 and H2 are the planes:H1: x i •w +b = +1H2: x i •w +b = -1The points on the planes H1 and H2 are the Support Vectors H1H2Moving a support vectormoves the decisionboundaryMoving the other vectors has no effect The algorithm to generate the weights proceeds in such a way that only the support vectors determine the weights and thus the boundary Maximizing the margind+d-We want a classifier with as big margin as possible.Recall the distance from a point(x 0,y 0) to a line:Ax+By+c = 0 is|A x 0+B y 0+c|/sqrt(A 2+B 2)The distance between H and H1 is:|w•x +b|/||w||=1/||w||The distance between H1 and H2 is: 2/||w||In order to maximize the margin, we need to minimize ||w||. With the condition that there are no datapoints between H1 and H2:x i •w +b ≥+1 when y i =+1x i •w +b ≤-1 when y i =-1 Can be combined into y i (x i •w) ≥1H1H2HWe now must solve a quadratic programming problem •Problem is: minimize||w||, s.t. discrimination boundary is obeyed, i.e., min f(x) s.t. g(x)=0, wheref: ½||w||2andg: y i(x i•w)-b = 1 or[y i(x i•w)-b] -1=0This is a constrained optimization problem Solved by Lagrangian multipler methodflatten paraboloid2-x2-2y2Intuition:intersection of two functions at atangent point.flattened paraboloid2-x2-2y2with superimposed constraint x2 +y2 = 1flattened paraboloid f: 2-x2-2y2=0with superimposed constraint g: x+y= 1Maximize when the constraint line g is tangent to the inner ellipse contour line of fflattened paraboloid f:2-x2-2y2=0 with superimposed constraint g: x+y= 1; at tangent solution p, gradient vectors of f,g are parallel (no possible move to incr f that also keeps you in region g) Maximize when the constraint line g is tangent to the inner ellipse contour line of fTwo constraints1.Parallel normal constraint (= gradient constrainton f, g solution is a max)2.G(x)=0 (solution is on the constraint line)We now recast these by combining f, g as theLagrangianRedescribing these conditions •Want to look for solution point p where•Or, combining these two as the Langrangian L & requiring derivative of L be zero:()()()0f pg p g x λ∇=∇=(,)()()(,)0L x f x g x x λλλ=−∇=How Langrangian solves constrainedoptimization(,)()() where(,)0L x f x g x x λλλ=−∇=Partial derivatives wrt x recover the parallel normal constraintPartial derivatives wrt λrecover the g (x,y )=0In general, (,)()()i i i L x f x g x λλ=+∑In general(,)()() a function of variables for the ', for the . Differentiating gives equations, each set to 0. The eqns differentiated wrt each give the gradient conditions; the i i i i L x f x g x n m n x s m n m n x ααα=+++∑ eqns differentiated wrt each recover the constraints i i m g αGradient max of f constraint condition gIn our case, f (x ): ½|| w ||2; g(x ): y i (w .x i +b)-1=0 so Lagrangian isL= ½|| w ||2-Σαi [y i (w .x i +b)-1]Lagrangian Formulation•In the SVM problem the Lagrangian is •From the derivatives = 0 we get()212110,l lP i i i ii i i L y b iααα==≡−⋅++≥∀∑∑w x w 11,0l li i i i i i i y y αα====∑∑w xThe Lagrangian trick Reformulate the optimization problem:A ”trick” often used in optimization is to do an Lagrangian formulation of the problem.The constraints will be replaced by constraints on the Lagrangian multipliers and the training data will occur only as dot products.Gives us the task:Max L = ∑αi –½∑αiαj x i•x j,Subject to:w= ∑αi y i xi∑αi y i= 0What we need to see: x i and x j(input vectors) appear only in the formof dot product –we will soon see why that is important.The Dual problem•Original problem:fix value of f and find α•New problem:Fix the values of α, and solve the(now unconstrained) problem max L(α, x)•Ie, get a solution for each α, f*(α)•Now minimize this over the space of α•Kuhn-Tucker theorem: this is equivalent tooriginal problemAt a solution p•The the constraint line g and the contour lines of f must be tangent•If they are tangent, their gradient vectors (perpindiculars) are parallel•Gradient of g must be 0 –I.e., steepest ascent & so perpendicular to f•Gradient of f must also be in the same direction as gInner productsThe task:Max L = ∑αi –½∑αiαj x i•x j,Subject to:w= ∑αi y i x i∑αi y i= 0Inner productInner productsWhy should inner product kernels be involved in pattern recognition?--Intuition is that they provide some measure of similarity --cf Inner product in 2D between 2 vectors of unit length returns the cosine of the angle between them.e.g. x= [1, 0]T, y= [0, 1]TI.e. if they are parallel inner product is 1x T x= x.x= 1If they are perpendicular inner product is 0x T y= x.y= 0But…are we done???Not Linearly SeparableFind a line that penalizespoints on “the wrong side”.x x x x x xx ϕ(o)X Fϕϕ(x)ϕ(x)ϕ(x)ϕ(x)ϕ(x)ϕ(x)ϕ(x)ϕ(o)ϕ(o)ϕ(o)ϕ(o)ϕ(o)ϕ(o)o o o o ooTransformation to separateNon Linear SVMsa b()()()2x a x b x a b x ab−−=−++{}2,x x x6•The idea is to gain linearly separation by mapping the data to a higher dimensional space –The following set can’t be separated by a linear function, but can be separated by a quadratic one–So if we map we gain linear separation Problems with linear SVM=-1=+1What if the decision function is not linear? What transform would separate these?Ans: polar coordinates!Non-linear SVM 1The Kernel trick =-1=+1Imagine a function φthat maps the data into another space:φ=Rd →Η=-1=+1Remember the function we want to optimize: L dual = ∑αi –½∑αi αj x i •x j ,x i and x j as a dot product. We will have φ(x i ) •φ(x j ) in the non-linear case.If there is a ”kernel function”K such as K(xi,xj) = φ(xi) •φ(xj), we do not need to know φexplicitly. One example: Rd ΗφWe’ve already seen a nonlineartransform…•What is it???•tanh(β0x T x i + β1)Examples for Non Linear SVMs()(),1pK =⋅+x y x y (){}222,exp K σ−=−x y x y ()(),tanh K κδ=⋅−x y x y 1st is polynomial (includes x•x as special case)2nd is radial basis function (gaussians)3rd is sigmoid (neural net activation function)Inner Product KernelsMercer’s theorem issatisfied only for some values of β0and β1tanh(β0x T x i + β1)Two layer perceptron The width σ2is specified apriori exp(1/(2σ2)||x-x i ||2)Radial-basis functionnetworkPower p is specified apriori by the user (x T x i + 1)p Polynomial learningmachineComments Inner Product KernelK(x,x i ), I = 1, 2, …, NType of Support Vector MachineNon-linear svm2The function we end up optimizing is:Max Ld = ∑αi–½∑αiαj K(xi•x j),Subject to:w= ∑αi y i x i∑αi y i= 0Another kernel example: The polynomial kernelK(xi,xj) = (xi•xj + 1)p, where p is a tunable parameter. Evaluating K only require one addition and one exponentiation more than the original dot product.Examples for Non Linear SVMs2 –Gaussian KernelLinearGaussianNonlinear rbf kernel Admiral’s delight w/ difft kernelfunctionsOverfitting by SVMBuilding an SVM Classifier •Now we know how to build a separator for two linearly separable classes•What about classes whose exemplary examples are not linearly separable?。