1Lagrange Duality Formulate the Lagrange dual problem of the following
linear programming prob-lem min cT rs.t.Ax 二b where a ∈R is variable,c
∈ R",A ∈Rk×n, b ∈ Rk.
解:设拉格朗日函数为L(x,λ)=cTx+λT(Ax−b)\mathcal{L}(x,\lambda)=c^Tx+\lambda^T(Ax-b)L(x,λ)=cTx+λT(Ax−b),
对应的对偶函数为G(λ)=infλL(x,λ)\mathcal{G}(\lambda)=inf_{\lambda}\ \mathcal{L}(x,\lambda)G(λ)=infλ L(x,λ),
而LP问题与对偶问题强对偶,KTT 条件成立,满足 stationarity
∇xcTx∗+λ∗T(Ax−b)=0\nabla_{x}c^Tx^*+{\lambda^*}^T(Ax-b)=0∇xcTx∗+λ∗T(Ax−b)=0
⟹\Longrightarrow⟹ cT+λ∗TA=0c^T+{\lambda^*}^TA=0cT+λ∗TA=0
以及Ax∗−b=0Ax^*-b=0Ax∗−b=0,因此该点处拉格朗日函数可以表达为
L(x∗,λ∗)=(−λTA)(A−1b)+λT(Ax∗−b)\mathcal{L}(x^*,\lambda^*)=(-\lambda^TA)(A^{-1}b)+\lambda^T(Ax^*-b)L(x∗,λ∗)=(−λTA)(A−1b)+λT(Ax∗−b)
L(x∗,λ∗)=−λTb\mathcal{L}(x^*,\lambda^*)=-\lambda^T bL(x∗,λ∗)=−λTb
根据 Dual feasibility 得 λi≥0\lambda_i\geq 0λi≥0
LP问题的对偶问题标准形式为
maxλ−λTbs.t.λ≥0,cT+λTA=0max_{\lambda}\ -\lambda^T b \\ s.t. \lambda\geq 0,c^T+{\lambda}^TA=0 maxλ −λTbs.t.λ≥0,cT+λTA=0
这里补充一种做法:
将拉格朗日对偶函数变换为 G(λ)=infL(x,λ)=inf(cT+λTA)x−λTb\mathcal{G}(\lambda)=inf\mathcal{L}(x,\lambda)=inf(c^T+\lambda^TA)x-\lambda^TbG(λ)=infL(x,λ)=inf(cT+λTA)x−λTb,
当 cT+λTA=0c^T+\lambda^TA=0cT+λTA=0 时,G(λ)=−λTb\mathcal{G}(\lambda)=-\lambda^TbG(λ)=−λTb;
否则,G(λ)=∞\mathcal{G}(\lambda)=\inftyG(λ)=∞,不存在极值。
sVM
2.1Convex Functions Prove f(w) = w" . (where w ∈ R") is a convex function.2.2Soft-Margin for Separable Data Consider training a
soft-margin SVM with C set to some positive constant.Suppose the
training data is linearly separable. Since increasing the 6; can
onlyincrease the objective of the primal problem (which we are trying
to minimize),at the optimal solution to the primal problem,all the
training examples willhave functional margin at least 1 and all the i
will be equal to zero. True orfalse? Explain! Given a linearly
separable dataset, is it necessarily better to usea a hard margin SVM
over a soft-margin SVM?
2.3In-bound Support Vectors in Soft-Margin sVMs Examples ar() with a > 0 are called support vectors (SVs). For soft-marginsVM we distinguish
between in-bound SVs,for which 0Show that in-bound SVs lie exactly on the margin.Argue that bound SVs
can lie both on or in the margin,and that they will“usually” lie in
the margin. Hint: use the KKT conditions.
2.1证:ωTω\omega^T\omegaωTω是凸函数
⟺\iff⟺ ∣∣λx+(1−λ)y∣∣2≤λ∣∣x∣∣2+(1−λ)∣∣y∣∣||\lambda x+(1-\lambda)y||^2\leq \lambda||x||^2+(1-\lambda)||y||∣∣λx+(1−λ)y∣∣2≤λ∣∣x∣∣2+(1−λ)∣∣y∣∣
⟺\iff⟺λ∣∣x∣∣2+(1−λ)∣∣y∣∣−(λx+(1−λ)y)T(λx+(1−λ)y)≥0\lambda||x||^2+(1-\lambda)||y||-(\lambda x+(1-\lambda)y)^T(\lambda x+(1-\lambda)y)\geq 0λ∣∣x∣∣2+(1−λ)∣∣y∣∣−(λx+(1−λ)y)T(λx+(1−λ)y)≥0
⟺\iff⟺λ∣∣x∣∣2+(1−λ)∣∣y∣∣−(λxT+(1−λ)yT)(λx+(1−λ)y)≥0\lambda||x||^2+(1-\lambda)||y||-(\lambda x^T+(1-\lambda)y^T)(\lambda x+(1-\lambda)y)\geq 0λ∣∣x∣∣2+(1−λ)∣∣y∣∣−(λxT+(1−λ)yT)(λx+(1−λ)y)≥0
⟺\iff⟺λ∣∣x∣∣2+(1−λ)∣∣y∣∣−(λ2xTx+λ(1−λ)(yTx+yTx)+(1−λ)2yTy)λ(1−λ)(yTx+yTx)≥0\lambda||x||^2+(1-\lambda)||y||-(\lambda^2 x^Tx+\lambda(1-\lambda)(y^Tx+y^Tx)+(1-\lambda)^2y^Ty)\lambda(1-\lambda)(y^Tx+y^Tx)\geq 0λ∣∣x∣∣2+(1−λ)∣∣y∣∣−(λ2xTx+λ(1−λ)(yTx+yTx)+(1−λ)2yTy)λ(1−λ)(yTx+yTx)≥0
⟺\iff⟺(λ−λ2)xTx+(λ−λ2)yTy−λ(1−λ)(yTx+yTx)≥0(\lambda-\lambda^2)x^Tx+(\lambda-\lambda^2)y^Ty-\lambda(1-\lambda)(y^Tx+y^Tx)\geq 0(λ−λ2)xTx+(λ−λ2)yTy−λ(1−λ)(yTx+yTx)≥0
⟺\iff⟺(λ−λ2)xTx+(λ−λ2)yTy−λ(1−λ)(yTx+yTx)≥0(\lambda-\lambda^2)x^Tx+(\lambda-\lambda^2)y^Ty-\lambda(1-\lambda)(y^Tx+y^Tx)\geq 0(λ−λ2)xTx+(λ−λ2)yTy−λ(1−λ)(yTx+yTx)≥0
而λ∈[0,1]\lambda\in[0,1]λ∈[0,1],因此λ≥λ2\lambda\geq \lambda^2λ≥λ2,
⟺\iff⟺xTx+yTy−(yTx+yTx)≥0x^Tx+y^Ty-(y^Tx+y^Tx)\geq 0xTx+yTy−(yTx+yTx)≥0
⟺\iff⟺(xT−yT)(x−y)≥0(x^T-y^T)(x-y)\geq 0(xT−yT)(x−y)≥0
⟺\iff⟺∣∣x−y∣∣2≥0||x-y||^2\geq 0∣∣x−y∣∣2≥0
而∣∣x−y∣∣2≥0||x-y||^2\geq 0∣∣x−y∣∣2≥0成立,故ωTω\omega^T\omegaωTω是凸函数,证毕。
2.2不一定,软间隔SVM模型表达为
minω,b,ξ12∣∣ω∣∣2+C∑i=1mξis.t.y(i)(ωTx(i)+b)≥1−ξiξi≥0,∀i=1,2,...,mmin_{\omega,b,\xi}\frac{1}{2}||\omega||^2+C\sum^m_{i=1}\xi_i \\ s.t. y^{(i)}(\omega^Tx^{(i)}+b)\geq1-\xi_i \\ \xi_i\geq0,\forall i=1,2,...,m minω,b,ξ21∣∣ω∣∣2+Ci=1∑mξis.t.y(i)(ωTx(i)+b)≥1−ξiξi≥0,∀i=1,2,...,m
考虑一维情形如下
令∀ξi=0\forall\xi_i=0∀ξi=0,即退化为硬间隔SVM,求得决策边界为ω1\omega_1ω1;
令ξj=0,j≠i\xi_j=0,j\neq iξj=0,j=i,求得决策边界为ω2\omega_2ω2;
目标函数设为fff,f(ω1)=12ω12f(\omega_1)=\frac{1}{2}\omega_1^2f(ω1)=21ω12,f(ω2)=12ω22+Cξif(\omega_2)=\frac{1}{2}\omega_2^2+C\xi_if(ω2)=21ω22+Cξi,
当12ω12>12ω22+Cξi\frac{1}{2}\omega_1^2>\frac{1}{2}\omega_2^2+C\xi_i21ω12>21ω22+Cξi时,ξi\xi_iξi可以不为0,ω2\omega_2ω2优于ω1\omega_1ω1,因而最优解一定不是ω1\omega_1ω1.
软间隔SVM可以避免过拟合,正如上面的例子,右侧橙色点可能是噪声,用硬间隔SVM会拟合噪声;
相反,前者通过松弛变量,泛化模型,提高鲁棒性,因此某些情况下有必要使用软间隔SVM。
2.3①当0<αi∗ 根据KTT条件αi∗+ri∗=C\alpha^*_i+r^*_i=Cαi∗+ri∗=C得0 又因为ri∗ξi∗=0r^*_i\xi^*_i=0ri∗ξi∗=0,所以ξi∗=0\xi^*_i=0ξi∗=0, 因为αi∗(y(i)(ω∗Tx(i)+b∗)+ξi∗−1)=0\alpha^*_i(y^{(i)}({\omega^*}^Tx^{(i)}+b^*)+\xi^*_i-1)=0αi∗(y(i)(ω∗Tx(i)+b∗)+ξi∗−1)=0, 所以y(i)(ω∗Tx(i)+b∗)+ξi∗−1=0y^{(i)}({\omega^*}^Tx^{(i)}+b^*)+\xi^*_i-1=0y(i)(ω∗Tx(i)+b∗)+ξi∗−1=0, 所以y(i)(ω∗Tx(i)+b∗)=1y^{(i)}({\omega^*}^Tx^{(i)}+b^*)=1y(i)(ω∗Tx(i)+b∗)=1, 即 in-bound SVs 在支撑平面上。 ②当αi∗=C\alpha^*_i=Cαi∗=C时,类似的可以得到y(i)(ω∗Tx(i)+b∗)+ξi∗−1=0y^{(i)}({\omega^*}^Tx^{(i)}+b^*)+\xi^*_i-1=0y(i)(ω∗Tx(i)+b∗)+ξi∗−1=0, 而ξi∗≥0\xi^*_i\geq0ξi∗≥0,因此y(i)(ω∗Tx(i)+b∗)≤1y^{(i)}({\omega^*}^Tx^{(i)}+b^*)\leq1y(i)(ω∗Tx(i)+b∗)≤1, 即 bound SVs 在支撑平面上或者在间隔内。 而往往少数的点就能确定支撑平面(n 维空间 n 个点确定一个 boundary),因此大部分的点在间隔内。