机器学习中的核函数

合集下载

径向核函数-概述说明以及解释

径向核函数-概述说明以及解释1.引言1.1 概述径向核函数（Radial Basis Function，RBF）是一种在机器学习领域广泛应用的核函数，它通过映射数据到高维特征空间来解决非线性分类和回归问题。

RBF核函数的特点是能够捕获数据之间的非线性关系，从而提高模型的泛化能力和预测准确性。

本文将从理解径向核函数的原理和应用、对其优缺点进行分析，并在结论部分对其进行总结和展望，以期为读者提供全面的了解和认识。

1.2 文章结构文章结构部分:本文主要分为三个部分，分别是引言、正文和结论。

在引言部分，将介绍径向核函数的概念和目的，为读者提供对该主题的初步了解。

同时，也会介绍本文的结构和内容安排，为读者呈现清晰的阅读框架。

正文部分将深入探讨径向核函数的理解和应用。

通过对径向核函数的原理和算法进行解析，帮助读者更好地理解其在机器学习和模式识别领域中的作用。

同时，本文还将探讨径向核函数的优缺点，对其在实际应用中的效果进行分析和评价。

结论部分将对全文进行总结，回顾本文所探讨的内容和观点。

同时，还将展望径向核函数在未来的应用前景，探讨其可能的发展方向。

最后，通过一些结束语，为本文画上一个完美的句号。

1.3 目的本文旨在深入探讨径向核函数在机器学习领域的应用及其原理。

通过对径向核函数的理解和应用实例的展示，我们希望读者能够更加全面地了解这一重要的核函数，并能够在实际问题中灵活运用。

同时，我们也将对径向核函数的优缺点进行分析，帮助读者在选择合适的核函数时有所侧重和考虑。

最终，我们希望通过本文的阐述，为读者提供了解径向核函数的深入洞察，并为其在实际应用中提供指导和帮助。

2.正文2.1 理解径向核函数径向核函数（RBF）是一种常用的核函数，也称为高斯核函数。

在支持向量机（SVM）等机器学习算法中，径向核函数被广泛应用于非线性分类和回归问题。

理解径向核函数的关键在于理解其计算方式和作用原理。

径向核函数的计算方式是通过测量数据点与指定中心点之间的距离来评估它们之间的相似性。

matlab核函数

matlab核函数在MATLAB中，核函数主要用于支持向量机（SVM）和其他机器学习算法。

这些函数用于计算两个向量之间的相似性或距离。

以下是一些常见的核函数：1. **线性核函数（Linear Kernel）**:```matlabK = x * y';```2. **多项式核函数（Polynomial Kernel）**:```matlabK = (gamma * x * y' + coef0)^degree;```3. **高斯径向基函数（Gaussian RBF Kernel）**:```matlabK = exp(-gamma * (x - y)' * (x - y));```4. **Sigmoid核函数（Sigmoid Kernel）**:```matlabK = tanh(gamma * x * y' + coef0);```其中，`x` 和`y` 是输入向量，`gamma` 和`coef0` 是核函数的参数，`degree` 是多项式核函数的度数。

在MATLAB的机器学习工具箱中，你可以使用`fitcknn` 函数来训练一个基于核的k近邻（KNN）分类器，其中你可以选择不同的核函数类型。

例如：```matlab% 创建一些数据X = [randn(100, 2) randn(100, 2)]; % 创建两个类别Y = [ones(100, 1) -1*ones(100, 1)]; % 对应的标签% 使用'rbf' 核函数训练一个基于核的KNN分类器SVMModel = fitcknn(X, Y, 'KernelFunction', 'rbf');```请注意，选择哪种核函数通常取决于你的具体应用和数据特性。

不同的核函数可能会产生不同的结果，因此需要根据你的问题进行适当的调整和实验。

贝叶斯优化核函数简介

贝叶斯优化核函数简介
贝叶斯优化（Bayesian optimization）是一种基于贝叶斯推断的优化方法，用于在复杂的搜索空间中找到最优解。

在机器学习和优化领域中被广泛应用。

核函数是机器学习中常用的一个概念，它可以衡量两个样本之间的相似度。

核函数将输入样本映射到一个高维特征空间，通过计算在该特征空间中的内积来度量样本之间的相似度。

贝叶斯优化的核函数是用于在搜索空间中定义样本之间相似度的函数，常用的核函数有高斯核函数（也称为径向基函数）和线性核函数等。

它们根据样本的特征来计算样本之间的相似度。

贝叶斯优化中的核函数可以用于以下两个方面：
1. 代理模型：贝叶斯优化通常使用代理模型来近似目标函数。

核函数可以用于定义代理模型中样本之间的相似度，例如高斯过程回归中的核函数。

通过观察已经评估过的样本点和目标函数值之间的关系，贝叶斯优化可以使用核函数构建代理模型，并使用该模型进行进一步的优化和探索。

2. 采样策略：贝叶斯优化根据代理模型的预测结果来选择下一个样本点进行评估，以在搜索空间中找到更好的解。

核函数可以用于计算不同样本点之间的相似度，进而选择具有高不确定性或潜在好潜力的样本点进行采样。

常用的策略包括最大化期望改进（Expected Improvement）和置信上界（Upper Confidence Bound），它们利用核函数的特性来平衡探索和利用之间的权衡。

总之，贝叶斯优化中的核函数在定义样本之间的相似度、构建代理模型和制定采样策略等方面起着重要的作用。

通过合理选择核函数，可以提高优化算法的效率和准确性。

常见的核函数

常见的核函数核函数是机器学习中一种常用的方法，它主要用于将高维空间中的数据映射到低维空间中，从而提升算法的性能。

核函数在SVM、PCA、KPCA等机器学习算法中广泛应用。

下面我们将介绍常见的核函数。

1. 线性核函数线性核函数是最简单的核函数之一，它是一种将数据点映射到低维空间的方式，其表达式如下：K(x_i, x_j) = (x_i * x_j)其中x_i, x_j是样本数据集中的两个数据，返回一个标量值。

线性核函数的优点在于需要的计算量较小，适用于大型数据集，但它的缺点是它只能处理线性分离的数据。

2. 多项式核函数其中x_i, x_j是样本数据集中的两个数据，c是一个常数，d是多项式的度数。

多项式核函数适用于非线性分离的数据。

3. 径向基函数(RBF)核函数其中x_i, x_j是样本数据集中的两个数据，gamma是一个正常数，||x_i - x_j||^2表示两个数据点之间的欧几里得距离的平方。

4. Sigmoid核函数其中x_i, x_j是样本数据集中的两个数据，alpha和beta是Sigmoid函数参数。

Sigmoid核函数适用于二分类问题。

上述四种核函数都是常见的核函数，它们各自有不同的优劣势，在不同的机器学习算法中应该选择适当的核函数来处理不同的数据。

除了上述四种常见的核函数，还有其他的一些核函数也具有重要的应用价值。

5. Laplacian核函数Laplacian核函数计算方式类似于径向基函数，但是它将样本数据点间的距离转化成样本数据点间的相似度，其表达式如下：K(x_i, x_j) = exp(-gamma * ||x_i - x_j||)其中gamma和径向基函数中的参数相同。

Laplacian核函数在图像识别和自然语言处理等领域有着广泛的应用。

6. ANOVA核函数ANOVA核函数通常用于数据分析和统计学中，它对混合多种类型数据的模型有较好的表现，其表达式如下：其中h_i和h_j是从样本数据点中提取出来的特征，gamma是一个常数。

核函数的计算与应用

核函数的计算与应用核函数在机器学习和模式识别领域中扮演着重要的角色。

它们能够将输入数据映射到更高维度的特征空间，从而解决线性不可分的问题。

本文将介绍核函数的计算方法，并探讨其在支持向量机（SVM）和主成分分析（PCA）等算法中的应用。

一、核函数的计算方法核函数是一种在机器学习中常用的函数，用于将低维空间的数据映射到高维空间。

常见的核函数包括线性核函数、多项式核函数、高斯径向基函数等。

1. 线性核函数线性核函数是最简单的核函数之一，它可以直接对原始特征进行线性变换。

其计算方法为：K(x, y) = x·y2. 多项式核函数多项式核函数通过多项式的方式将数据映射到高维空间。

其计算方法为：K(x, y) = (x·y + c)^d3. 高斯径向基函数（RBF）高斯径向基函数是一种常用的核函数，它可以将数据映射到无穷维的特征空间。

其计算方法为：K(x, y) = exp(-γ ||x-y||^2)其中，γ为高斯核函数的带宽参数，||x-y||表示输入数据x和y之间的欧氏距离。

二、核函数在支持向量机中的应用支持向量机是一种常用的分类器，它能够在非线性可分问题上取得较好的性能。

核函数在支持向量机中起到了关键作用。

1. 线性支持向量机线性支持向量机通过线性核函数对数据进行映射，从而实现特征的扩展。

它在处理线性可分问题时表现出色，计算效率高。

2. 非线性支持向量机非线性支持向量机通过非线性核函数对数据进行映射，从而解决非线性可分问题。

常用的非线性核函数包括多项式核函数和高斯径向基函数。

三、核函数在主成分分析中的应用主成分分析是一种常用的降维技术，它通过将高维数据映射到低维空间，提取出最重要的特征。

核函数在主成分分析中也有广泛的应用。

1. 核主成分分析（Kernel PCA）核主成分分析是主成分分析的扩展形式，它通过非线性核函数将数据映射到高维空间，再进行降维操作。

相比传统主成分分析，核主成分分析能够更好地处理非线性关系。

高斯径向基核函数

高斯径向基核函数
高斯径向基核函数是一种在机器学习中广泛应用的核函数，它的形式为：
K(x_i, x_j) = exp(-γ ||x_i - x_j||)
其中，x_i和x_j分别表示数据集中的两个样本，γ为高斯函数的带宽参数，||.||表示欧氏距离。

高斯径向基核函数可以将数据从低维空间映射到高维空间，从而更容易处理线性不可分的问题。

在支持向量机、核主成分分析、径向基函数神经网络等领域广泛应用。

在实际使用中，高斯径向基核函数的带宽参数γ需要进行调参，不同的γ值会对分类结果产生不同的影响。

一般而言，γ越大，模型越接近高斯分布，分类结果可能会出现过拟合；γ越小，模型越接近于线性，分类结果可能会出现欠拟合。

因此，需要根据具体问题和数据特点来选择合适的γ值。

- 1 -。

高斯过程核函数选取

高斯过程核函数选取高斯过程是一种常用的机器学习方法，核函数是高斯过程的重要组成部分。

核函数的选取直接影响到高斯过程的性能和效果。

以下是一些常用的高斯过程核函数及其特点：1. 高斯核函数（Gaussian Kernel）：也称为径向基函数（Radial Basis Function，RBF）核函数，是高斯过程中最常用的核函数之一。

它的形式为：$k(x,x')=\sigma^2\exp(-\frac{\|x-x'\|^2}{2l^2})$，其中$\sigma^2$和$l$是超参数，$\|x-x'\|$表示$x$和$x'$之间的欧几里得距离。

高斯核函数的优点是具有很好的拟合能力和泛化能力，但是超参数的选择比较困难。

2. 线性核函数（Linear Kernel）：线性核函数的形式为$k(x,x')=x^Tx'$，其中$x$和$x'$是输入向量。

线性核函数的优点是计算速度快，但是它的拟合能力比较弱。

3. 多项式核函数（Polynomial Kernel）：多项式核函数的形式为$k(x,x')=(x^Tx'+c)^d$，其中$c$和$d$是超参数。

多项式核函数的优点是可以处理非线性问题，但是超参数的选择比较困难。

4. 指数核函数（Exponential Kernel）：指数核函数的形式为$k(x,x')=\exp(-\frac{\|x-x'\|}{l})$，其中$l$是超参数。

指数核函数的优点是可以处理非线性问题，但是它的拟合能力比较弱。

5. Laplace核函数（Laplace Kernel）：Laplace核函数的形式为$k(x,x')=\exp(-\frac{\|x-x'\|}{l})$，其中$l$是超参数。

Laplace核函数的优点是可以处理非线性问题，但是它的拟合能力比较弱。

以上是一些常用的高斯过程核函数及其特点，具体选取哪种核函数需要根据具体问题进行选择。

sigmoid核函数

sigmoid核函数
Sigmoid核函数，又称为S型函数，是一种非线性的函数，具有良好的分类性能，是计算机视觉任务中非常常用的对象分类函数。

Sigmoid核函数本质上是一种非线性变换，将一个持续变化的实数型输入信号变换为一个位值信号量，以此来模拟类比晶体管する二进制逻辑电路的分类器，从而实现对输入信号的二元分类。

Sigmoid核函数的函数公式为：
f(x) = 1/(1+e^-x)
其中，e是自然对数的底数，x为自变量。

Sigmoid核函数的函数图如下所示：
Sigmoid核函数的特点在于，当自变量的值越来越大，函数的值也越来越大，但不会越大越好，而是有一个临界点，当自变量超过临界点时，函数值固定为1，当自变量小于临界点时，函数值固定为0.同时，Sigmoid核函数的函数形态较为平滑，函数导数也很大，因此在求解梯度下降法的时候，可以多次调整自变量的取值，使模型的参数在迭代的过程中变化得更加平滑，从而提高求解的效率，减少计算量。

Sigmoid核函数在计算机视觉任务中广泛的使用，在识别技术中，用来实现细化分类。

比如，给定一系列距离信息，用Sigmoid核函数可以将其划分为两类，一类属于已知物体，另一类属于其他物体，从而实现物体分类的功能。

此外，Sigmoid核函数还可以用在机器学习领域，如深度学习中的神经网络，用于处理分类问题，如多分类，回
归等。

综上所述，Sigmoid核函数能够比较准确的对实数型输入进行分类，对于计算机视觉和深度学习任务具有良好的拟合作用，是一种处理函数，具有很强的实用价值。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

Kernel Functions for Machine Learning Applications机器学习中的核函数1.核函数概述In recent years, Kernel methods have received major attention, particularly due to the increased popularity of the Support Vector Machines. Kernel functions can be used in many applications as they provide a simple bridge from linearity to non-linearity for algorithms which can be expressed in terms of dot products. In this article, we will list a few kernel functions and some of their properties.Many of these functions have been incorporated in , a extension framework for the popular Framework which also includes many other statistics and machine learning tools.2.机器学习中的核函数Kernel Methods（核函数方法）Kernel methods are a class of algorithms for pattern analysis or recognition, whose best known element is the support vector machine (SVM). The general task of pattern analysis is to find and study general types of relations (such as clusters, rankings, principal components, correlations, classifications) in general types of data (such as sequences, text documents, sets of points, vectors, images, graphs, etc) (Wikipedia, 2010a).The main characteristic of Kernel Methods, however, is their distinct approach to this problem. Kernel methods map the data into higher dimensional spaces in the hope that in this higher-dimensional space the data could become more easily separated or better structured. There are also no constraints on the form of this mapping, which could even lead to infinite-dimensional spaces. This mapping function, however, hardly needs to be computed because of a tool called the kernel trick.The Kernel Trick（核函数构造）The kernel trick is a mathematical tool which can be applied to any algorithm which solely depends on the dot product between two vectors. Wherever a dot product is used, it is replaced by a kernel function. When properly applied, those candidate linear algorithms are transformed into a non-linear algorithms (sometimes with little effort or reformulation). Those non-linear algorithms are equivalent to their linear originals operating in the range space of a feature space φ. However, because kernels are used, the φ function does not need to be ever explicitly computed. This is highly desirable, as we noted previously, because this higher-dimensional feature space could even be infinite-dimensional and thus infeasible to compute. There are also no constraints on the nature of the input vectors. Dot products could be defined between any kind of structure, such as trees or strings.Kernel Properties（核函数特性）Kernel functions must be continuous, symmetric, and most preferably should have a positive (semi-) definite Gram matrix. Kernels which are said to satisfy the Mercer's theorem are positivesemi-definite, meaning their kernel matrices have no non-negative Eigen values. The use of a positive definite kernel insures that the optimization problem will be convex and solution will be unique.However, many kernel functions which aren’t strictly positive definite also have been shown to perform very well in practice. An example is the Sigmoid kernel, which, despite its wide use, it is not positive semi-definite for certain values of its parameters. Boughorbel (2005) also experimentally demonstrated that Kernels which are only conditionally positive definite can possibly outperform most classical kernels in some applications.Kernels also can be classified as anisotropic stationary, isotropic stationary, compactly supported, locally stationary, nonstationary or separable nonstationary. Moreover, kernels can also be labeled scale-invariant or scale-dependant, which is an interesting property as scale-invariant kernels drive the training process invariant to a scaling of the data.Choosing the Right Kernel（怎样选择正确的核函数）Choosing the most appropriate kernel highly depends on the problem at hand - and fine tuning its parameters can easily become a tedious and cumbersome task. Automatic kernel selection is possible and is discussed in the works by Tom Howley and Michael Madden.The choice of a Kernel depends on the problem at hand because it depends on what we are trying to model. Apolynomial kernel, for example, allows us to model feature conjunctions up to the order of the polynomial. Radial basis functions allows to pick out circles (or hyperspheres) - in constrast with the Linear kernel, which allows only to pick out lines (or hyperplanes).The motivation behind the choice of a particular kernel can be very intuitive and straightforward depending on what kind of information we are expecting to extract about the data. Please see the final notes on this topic from Introduction to Information Retrieval, by Manning, Raghavan and Schütze for a better explanation on the subject.Kernel Functions（常见的核函数）Below is a list of some kernel functions available from the existing literature. As was the case with previous articles, every LaTeX notation for the formulas below are readily available from their alternate text html tag. I can not guarantee all of them are perfectly correct, thus use them at your own risk. Most of them have links to articles where they have been originally used or proposed.1. Linear KernelThe Linear kernel is the simplest kernel function. It is given by the inner product <x,y> plus an optional constant c. Kernel algorithms using a linear kernel are often equivalent to their non-kernel counterparts, i.e. KPCA with linear kernel is the same as standard PCA.2. Polynomial KernelThe Polynomial kernel is a non-stationary kernel. Polynomial kernels are well suited for problems where all the training data is normalized.Adjustable parameters are the slope alpha, the constant term c and the polynomial degree d.3. Gaussian KernelThe Gaussian kernel is an example of radial basis function kernel.The adjustable parameter sigma plays a major role in the performance of the kernel, and should be carefully tuned to the problem at hand. If overestimated, the exponential will behave almost linearly and the higher-dimensional projection will start to lose its non-linear power. In the other hand, if underestimated, the function will lack regularization and the decision boundary will be highly sensitive to noise in training data.4. Exponential KernelThe exponential kernel is closely related to the Gaussian kernel, with only the square of the norm left out. It is also a radial basis function kernel.5. Laplacian KernelThe Laplace Kernel is completely equivalent to the exponential kernel, except for being less sensitive for changes in the sigma parameter. Being equivalent, it is also a radial basis function kernel.It is important to note that the observations made about the sigma parameter for the Gaussian kernel also apply to the Exponential and Laplacian kernels.6. ANOVA KernelThe ANOVA kernel is also a radial basis function kernel, just as the Gaussian and Laplacian kernels. It is said toperform well in multidimensional regression problems (Hofmann, 2008).7. Hyperbolic Tangent (Sigmoid) KernelThe Hyperbolic Tangent Kernel is also known as the Sigmoid Kernel and as the Multilayer Perceptron (MLP) kernel. The Sigmoid Kernel comes from the Neural Networks field, where the bipolar sigmoid function is often used as anactivation function for artificial neurons.It is interesting to note that a SVM model using a sigmoid kernel function is equivalent to a two-layer, perceptron neural network. This kernel was quite popular for support vector machines due to its origin from neural network theory. Also, despite being only conditionally positive definite, it has been found to perform well in practice.There are two adjustable parameters in the sigmoid kernel, the slope alpha and the intercept constant c. A common value for alpha is 1/N, where N is the data dimension. A more detailed study on sigmoid kernels can be found in theworks by Hsuan-Tien and Chih-Jen.8. Rational Quadratic KernelThe Rational Quadratic kernel is less computationally intensive than the Gaussian kernel andcan be used as an alternative when using the Gaussian becomes too expensive.9. Multiquadric KernelThe Multiquadric kernel can be used in the same situations as the Rational Quadratic kernel. As is the case with the Sigmoid kernel, it is also an example of an non-positive definite kernel.10. Inverse Multiquadric KernelThe Inverse Multi Quadric kernel. As with the Gaussian kernel, it results in a kernel matrix with full rank (Micchelli, 1986) and thus forms a infinite dimension feature space.11. Circular KernelThe circular kernel is used in geostatic applications. It is an example of an isotropic stationary kernel and is positive definite in .12. Spherical KernelThe spherical kernel is similar to the circular kernel, but is positive definite in R3.13. Wave KernelThe Wave kernel is also symmetric positive semi-definite (Huang, 2008).14. Power KernelThe Power kernel is also known as the (unrectified) triangular kernel. It is an example of scale-invariant kernel (Sahbi and Fleuret, 2004) and is also only conditionally positive definite.15. Log KernelThe Log kernel seems to be particularly interesting for images, but is only conditionally positive definite.16. Spline KernelThe Spline kernel is given as a piece-wise cubic polynomial, as derived in the works by Gunn (1998).17. B-Spline (Radial Basis Function) KernelThe B-Spline kernel is defined on the interval [−1, 1]. It is given by the recursive formula:In the work by Bart Hamers it is given by:Alternatively, Bn can be computed using the explicit expression (Fomel, 2000):Where x+ is defined as the truncated power function:18. Bessel KernelThe Bessel kernel is well known in the theory of function spaces of fractional smoothness. It is given by:where J is the Bessel function of first kind. However, in the Kernlab for R documentation, the Bessel kernel is said to be:19. Cauchy KernelThe Cauchy kernel comes from the Cauchy distribution (Basak, 2008). It is a long-tailed kernel and can be used to give long-range influence and sensitivity over the high dimension space.20. Chi-Square KernelThe Chi-Square kernel comes from the Chi-Square distribution.21. Histogram Intersection KernelThe Histogram Intersection Kernel is also known as the Min Kernel and has been proven useful in image classification.22. Generalized Histogram IntersectionThe Generalized Histogram Intersection kernel is built based on the Histogram Intersection Kernel for image classification but applies in a much larger variety of contexts (Boughorbel, 2005). It is given by:23. Generalized T-Student KernelThe Generalized T-Student Kernel has been proven to be a Mercel Kernel, thus having a positive semi-definite Kernel matrix (Boughorbel, 2004). It is given by:24. Bayesian KernelThe Bayesian kernel could be given as:However, it really depends on the problem being modeled. For more information, please see the work by Alashwal, Deris and Othman, in which they used a SVM with Bayesian kernels in the prediction of protein-protein interactions.25. Wavelet KernelThe Wavelet kernel (Zhang et al, 2004) comes from Wavelet theory and is given as:Where a and c are the wavelet dilation and translation coefficients, respectively (the form presented above is a simplification, please see the original paper for details). A translation-invariant version of this kernel can be given as:Where in both h(x) denotes a mother wavelet function. In the paper by Li Zhang, Weida Zhou, and Licheng Jiao, the authors suggests a possible h(x) as:Which they also prove as an admissible kernel function.See also（推荐阅读）Kernel Support Vector Machines (kSVMs)Principal Component Analysis (PCA)3.参考文献On-Line Prediction Wiki Contributors. "Kernel Methods." On-Line Prediction Wiki. /?n=Main.KernelMethods (accessed March 3, 2010). Genton, Marc G. "Classes of Kernels for Machine Learning: A Statistics Perspective." Journal of Machine Learning Research 2 (2001) 299-312.Hofmann, T., B. Schölkopf, and A. J. Smola. "Kernel methods in machine learning." Ann. Statist. Volume 36, Number 3 (2008), 1171-1220.Gunn, S. R. (1998, May). "Support vector machines for classification and regression." Technical report, Faculty of Engineering, Science and Mathematics School of Electronics and Computer Science.Karatzoglou, A., Smola, A., Hornik, K. and Zeileis, A. "Kernlab – an R package for kernel Learning." (2004).Karatzoglou, A., Smola, A., Hornik, K. and Zeileis, A. "Kernlab – an S4 package for kernel methods in R." J. Statistical Software, 11, 9 (2004).Karatzoglou, A., Smola, A., Hornik, K. and Zeileis, A. "R: Kernel Functions." Documentation for package 'kernlab' version 0.9-5. /Rdoc/library/kernlab/html/dots.html (accessed March 3, 2010). Howley, T. and Madden, M.G. "The genetic kernel support vector machine: Description and evaluation". Artificial Intelligence Review. Volume 24, Number 3 (2005), 379-395.Shawkat Ali and Kate A. Smith. "Kernel Width Selection for SVM Classification: A Meta-Learning Approach." International Journal of Data Warehousing & Mining, 1(4), 78-97, October-December 2005.Hsuan-Tien Lin and Chih-Jen Lin. "A study on sigmoid kernels for SVM and the training of non-PSD kernels by SMO-type methods." Technical report, Department of Computer Science, National Taiwan University, 2003.Boughorbel, S., Jean-Philippe Tarel, and Nozha Boujemaa. "Project-Imedia: Object Recognition." INRIA - INRIA Activity Reports - RalyX. http://ralyx.inria.fr/2004/Raweb/imedia/uid84.html (accessed March 3, 2010).Huang, Lingkang. "Variable Selection in Multi-class Support Vector Machine and Applications in Genomic Data Analysis." PhD Thesis, 2008.Manning, Christopher D., Prabhakar Raghavan, and Hinrich Schütze. "Nonlinear SVMs." The Stanford NLP (Natural Language Processing) Group. /IR-book/html/htmledition/nonlinear-svms-1.html(accessed March 3, 2010).Fomel, Sergey. "Inverse B-spline interpolation." Stanford Exploration Project, 2000./public/docs/sep105/sergey2/paper_html/node5.html (accessed March 3, 2010).Basak, Jayanta. "A least square kernel machine with box constraints." International Conference on Pattern Recognition 2008 1 (2008): 1-4.Alashwal, H., Safaai Deris, and Razib M. Othman. "A Bayesian Kernel for the Prediction of Protein - Protein Interactions." International Journal of Computational Intelligence 5, no. 2 (2009): 119-124.Hichem Sahbi and François Fleuret. “Kernel methods and scale invariance using the triangular kernel”. INRIA Research Report, N-5143, March 2004.Sabri Boughorbel, Jean-Philippe Tarel, and Nozha Boujemaa. “Generalized histogram intersection kernel for image recognition”. Proceedings of the 2005 Conference on Image Processing, volume 3, pages 161-164, 2005.Micchelli, Charles. Interpolation of scattered data: Distance matrices and conditionally positive definite functions. Constructive Approximation 2, no. 1 (1986): 11-22.Wikipedia contributors, "Kernel methods," Wikipedia, The Free Encyclopedia, /w/index.php?title=Kernel_methods&oldid=340911970 (ac cessed March 3, 2010).Wikipedia contributors, "Kernel trick," Wikipedia, The Free Encyclopedia, /w/index.php?title=Kernel_trick&oldid=269422477 (access ed March 3, 2010).Weisstein, Eric W. "Positive Semidefinite Matrix." From MathWorld--A Wolfram Web Resource./PositiveSemidefiniteMatrix.htmlHamers B. "Kernel Models for Large Scale Applications'', Ph.D. , Katholieke Universiteit Leuven, Belgium, 2004.Li Zhang, Weida Zhou, Licheng Jiao. Wavelet Support Vector Machine. IEEE Transactions on System, Man, and Cybernetics, Part B, 2004, 34(1): 34-39.。