A Simple Introduction to Support Vector Machines.ppt
《A Simple Introduction to Support Vector Machines.ppt》由会员分享,可在线阅读,更多相关《A Simple Introduction to Support Vector Machines.ppt(56页珍藏版)》请在麦多课文档分享上搜索。
1、A Simple Introduction to Support Vector Machines,Martin Law Lecture for CSE 802 Department of Computer Science and Engineering Michigan State University,10/9/2018,CSE 802. Prepared by Martin Law,2,Outline,A brief history of SVM Large-margin linear classifier Linear separable Nonlinear separable Crea
2、ting nonlinear classifiers: kernel trick A simple example Discussion on SVM Conclusion,10/9/2018,CSE 802. Prepared by Martin Law,3,History of SVM,SVM is related to statistical learning theory 3 SVM was first introduced in 1992 1 SVM becomes popular because of its success in handwritten digit recogni
3、tion 1.1% test error rate for SVM. This is the same as the error rates of a carefully constructed neural network, LeNet 4. See Section 5.11 in 2 or the discussion in 3 for details SVM is now regarded as an important example of “kernel methods”, one of the key area in machine learning Note: the meani
4、ng of “kernel” is different from the “kernel” function for Parzen windows,1 B.E. Boser et al. A Training Algorithm for Optimal Margin Classifiers. Proceedings of the Fifth Annual Workshop on Computational Learning Theory 5 144-152, Pittsburgh, 1992. 2 L. Bottou et al. Comparison of classifier method
5、s: a case study in handwritten digit recognition. Proceedings of the 12th IAPR International Conference on Pattern Recognition, vol. 2, pp. 77-82. 3 V. Vapnik. The Nature of Statistical Learning Theory. 2nd edition, Springer, 1999.,10/9/2018,CSE 802. Prepared by Martin Law,4,What is a good Decision
6、Boundary?,Consider a two-class, linearly separable classification problem Many decision boundaries! The Perceptron algorithm can be used to find such a boundary Different algorithms have been proposed (DHS ch. 5) Are all decision boundaries equally good?,10/9/2018,CSE 802. Prepared by Martin Law,5,E
7、xamples of Bad Decision Boundaries,Class 1,Class 2,Class 1,Class 2,10/9/2018,CSE 802. Prepared by Martin Law,6,Large-margin Decision Boundary,The decision boundary should be as far away from the data of both classes as possible We should maximize the margin, m Distance between the origin and the lin
8、e wtx=k is k/|w|,Class 1,Class 2,m,10/9/2018,CSE 802. Prepared by Martin Law,7,Finding the Decision Boundary,Let x1, ., xn be our data set and let yi 1,-1 be the class label of xi The decision boundary should classify all points correctly The decision boundary can be found by solving the following c
9、onstrained optimization problemThis is a constrained optimization problem. Solving it requires some new tools Feel free to ignore the following several slides; what is important is the constrained optimization problem above,10/9/2018,CSE 802. Prepared by Martin Law,8,Recap of Constrained Optimizatio
10、n,Suppose we want to: minimize f(x) subject to g(x) = 0 A necessary condition for x0 to be a solution:a: the Lagrange multiplier For multiple constraints gi(x) = 0, i=1, , m, we need a Lagrange multiplier ai for each of the constraints,10/9/2018,CSE 802. Prepared by Martin Law,9,Recap of Constrained
11、 Optimization,The case for inequality constraint gi(x)0 is similar, except that the Lagrange multiplier ai should be positive If x0 is a solution to the constrained optimization problemThere must exist ai0 for i=1, , m such that x0 satisfyThe function is also known as the Lagrangrian; we want to set
12、 its gradient to 0,10/9/2018,CSE 802. Prepared by Martin Law,10,Back to the Original Problem,The Lagrangian isNote that |w|2 = wTwSetting the gradient of w.r.t. w and b to zero, we have,10/9/2018,CSE 802. Prepared by Martin Law,11,The Dual Problem,If we substitute to , we have Note that This is a fu
13、nction of ai only,10/9/2018,CSE 802. Prepared by Martin Law,12,The Dual Problem,The new objective function is in terms of ai only It is known as the dual problem: if we know w, we know all ai; if we know all ai, we know w The original problem is known as the primal problem The objective function of
14、the dual problem needs to be maximized! The dual problem is therefore:,Properties of ai when we introduce the Lagrange multipliers,The result when we differentiate the original Lagrangian w.r.t. b,10/9/2018,CSE 802. Prepared by Martin Law,13,The Dual Problem,This is a quadratic programming (QP) prob
15、lem A global maximum of ai can always be foundw can be recovered by,10/9/2018,CSE 802. Prepared by Martin Law,14,Characteristics of the Solution,Many of the ai are zero w is a linear combination of a small number of data points This “sparse” representation can be viewed as data compression as in the
16、 construction of knn classifier xi with non-zero ai are called support vectors (SV) The decision boundary is determined only by the SV Let tj (j=1, ., s) be the indices of the s support vectors. We can write For testing with a new data z Compute and classify z as class 1 if the sum is positive, and
17、class 2 otherwise Note: w need not be formed explicitly,10/9/2018,CSE 802. Prepared by Martin Law,15,The Quadratic Programming Problem,Many approaches have been proposed Loqo, cplex, etc. (see http:/www.numerical.rl.ac.uk/qp/qp.html) Most are “interior-point” methods Start with an initial solution t
18、hat can violate the constraints Improve this solution by optimizing the objective function and/or reducing the amount of constraint violation For SVM, sequential minimal optimization (SMO) seems to be the most popular A QP with two variables is trivial to solve Each iteration of SMO picks a pair of
19、(ai,aj) and solve the QP with these two variables; repeat until convergence In practice, we can just regard the QP solver as a “black-box” without bothering how it works,10/9/2018,CSE 802. Prepared by Martin Law,16,a6=1.4,A Geometrical Interpretation,Class 1,Class 2,a1=0.8,a2=0,a3=0,a4=0,a5=0,a7=0,a
20、8=0.6,a9=0,a10=0,10/9/2018,CSE 802. Prepared by Martin Law,17,Non-linearly Separable Problems,We allow “error” xi in classification; it is based on the output of the discriminant function wTx+bxi approximates the number of misclassified samples,10/9/2018,CSE 802. Prepared by Martin Law,18,Soft Margi
21、n Hyperplane,If we minimize ixi, xi can be computed byxi are “slack variables” in optimization Note that xi=0 if there is no error for xi xi is an upper bound of the number of errors We want to minimizeC : tradeoff parameter between error and margin The optimization problem becomes,10/9/2018,CSE 802
22、. Prepared by Martin Law,19,The Optimization Problem,The dual of this new constrained optimization problem isw is recovered asThis is very similar to the optimization problem in the linear separable case, except that there is an upper bound C on ai now Once again, a QP solver can be used to find ai,
23、10/9/2018,CSE 802. Prepared by Martin Law,20,Extension to Non-linear Decision Boundary,So far, we have only considered large-margin classifier with a linear decision boundary How to generalize it to become nonlinear? Key idea: transform xi to a higher dimensional space to “make life easier” Input sp
24、ace: the space the point xi are located Feature space: the space of f(xi) after transformation Why transform? Linear operation in the feature space is equivalent to non-linear operation in input space Classification can become easier with a proper transformation. In the XOR problem, for example, add
- 1.请仔细阅读文档,确保文档完整性,对于不预览、不比对内容而直接下载带来的问题本站不予受理。
- 2.下载的文档,不会出现我们的网址水印。
- 3、该文档所得收入(下载+内容+预览)归上传者、原创作者;如果您是本文档原作者,请点此认领!既往收益都归您。
下载文档到电脑,查找使用更方便
2000 积分 0人已下载
下载 | 加入VIP,交流精品资源 |
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- ASIMPLEINTRODUCTIONTOSUPPORTVECTORMACHINESPPT

链接地址:http://www.mydoc123.com/p-377882.html