MachineLearning｜8. Support Vector Machines and Kernels

2024/02/13 MachineLearning 共 3499 字，约 10 分钟

Buliangzhang

Separating Hyperplanes

when you want to find a decision boundary, avoid estimating densities
linear decision boundry,就是设置这个式子为0，就是它的decision boundry hyperplanes separate feature space into regions
for new X, 可以带入最后一个式子，然后看y=1,还是y=-1来分类 exercise1. This problem involves hyperplanes in two dimensions. (a) Sketch the hyperplane 1 + 3X1 − X2 = 0. Indicate the set of points for which 1 + 3X1 − X2 > 0, as well as the set of points for which 1 + 3X1 − X2 < 0. (b) On the same plot, sketch the hyperplane −2 + X1 + 2X2 = 0. Indicate the set of points for which −2 + X1 + 2X2 > 0, as well as the set of points for which −2 + X1 + 2X2 < 0.
Maximum margin classifiers
（怎么找一个最佳的hyperplane）
generalization link to regularization
for generalization, the decision boundary should lie between the class boundaries
Maximize perpendicular distance(最大化垂直距离) between the decision boundary and the ==nearest observations==: the margin M the decision boundary then only depends on a few points on the margin, the support vectors:if you remove one from other ob, nothing changes
所以， leave one out will fail
Construction
maximize the margin（尝试最大边际化）, under the constraint that training observations are classified correctly
Limitation
● separable classes ● linear separability ● two classes
Problems
Maximum margin classifier is prone to ==overfitting==: very sensitive to training set
When classes ==overlap==, separating hyperplane does not exist
We need to make a trade-off between errors on the training set and predicted performance on the test set (generalization)
Solution: The soft margin
为了解决上面的问题
Solution: ==allow (some, small) errors on the training set,==introducing slack （松弛）variables ∊i ≥ 0»>Add slack variables to maximum margin classifier, but limit total slack to C: the trade-off parameter
变化：M» M(1-e)，然后引入一个C，误差平方和<=C
C influence solution: There is no a prori best choice for C
The multi-class case
one-versus-one
one- versus-all exercise
Optimization(optional)
从最大化M(margin)到最小化一个系数w和常数b的平方和但是为了解决is a (large) quadratic programming (QP) problem 又引入了lagrange multipliers alphai
The nonlinear case
Ideal: classes may become linearly separable if higher order terms are added(cf. nonlinear regression) ![[22e573c5252d8db1948d586ddaa54dd.png]]
Questions
feature是要平方还是开方，等等，我不知道
efficiently train the SVC,就是p= 10 ,如果要3次，就会有286种组合方式了
The Kernel Trick
训练阶段，我们需要计算所有训练样本之间的内积，如 xi 和 xi’ 之间的内积。训练完成后，当我们需要对新的样本进行分类时，我们只需要计算支持向量 (support vectors) 与新样本之间的内积。引入一个用于泛化的核函数Think of kernel functions as similarities: large when the inputs are very alike, small when they are not 到这一步了，才有两种方法可以选择K（x_i, x）
Polynomial kernel
Radial kernel
The support vector machine
Choosing Kernels
What kernel functions should we use? type: prior knowledge of problem, trial-and-error parameters: cross-validation, like for C exercise
More Kernels
A large number of kernels have been proposed,not limited to numerical/vector data! ● Vector kernels ● Set kernels ● String kernels ● Empirical kernel map ● Kernel kernels ● Kernel combination ● Kernels on graphs ● Kernels in graphs ● Kernels on probabilistic models
Recap
Separating hyperplane: any plane that separates classes
The support vector machine has evolved from fundamental work by Vapnik: separable, linear problems: maximum margin classifier non-separable, linear problems - add slack variables:support vector classifier non-separable, non-linear problems - use kernel trick:support vector machine
Training involves quadratic programming (optimization)
The final classifier only depends on the support vectors
SVMs are widely used and work well in many cases,but care needs to be taken in selecting C, the kernel type and its parameters (using cross-validation)
The kernel trick: replace inner products by more general kernel functions can be applied in many other algorithms
Many kernels have been proposed for non-vector data, i.e. sets, strings, graphs etc.: very useful in bioinformatics, vision, document analysis etc.
SVMs are linked to logistic regression (section 9.5, not discussed here)
Support Vector
Support Classifiers

文档信息

本文作者：Xinyi He
本文链接：https://buliangzhang24.github.io/2024/02/13/MachineLearning-8.-Support-Vector-Machines-and-Kernels/
版权声明：自由转载-非商用-非衍生-保持署名（创意共享3.0许可证）

Search

Table of Contents