Support Vector Machine

How the algorithm works:

As you can see from the image above, SVM works by dividing different classes with a hyperplane. For the explanation of how this algorithm works, I will be using a 2D graph to simplify how this essentially works.

← 2D case

SVM works by finding the optimal weights and bias that separates two different classes. The linear line (when looking at 2D graph) has the equation: w*x - b =0. When w*x - b ≥ 0 it would predict class 1 and if w*x -b < 0 it would predict class 2.

What the SVM model tries to do is to maximize the margin between the support vectors(blue and green points on w*x - b =1 and w*x -b = -1; points that are on the boundary).

Hyperplane Definition

w*x - b ≥ +1 (yi = +1) {where +1 = class 1}
w*x - b ≤ -1 (yi = -1) {where -1 = class 2}

The two could be combined to form: yi ( w*x - b ) ≥ 1

Finding the Separation of the Margin:
We know that w*x - b = +1 and w*x - b = -1  are the lines going through the support vectors. Let's say that x1 is the support vector for class +1 and x2 the support vector for class -1. Then the two lines would become w*x1 - b = +1 and w* x2- b = -1.

If we considered this a simulation equation problem, then we would get w*(x1 - x2 ) = 2 and if we divide both sides by the magnitude of w to get the total distance between the support vectors we get
(w*(x1 - x2 ))/⟪w⟫ = 2/⟪w⟫

Margin:  2/⟪w⟫

As we are trying to maximize the margin, we would need to try and maximize:  2/⟪w⟫

basically minimizing ⟪w⟫, which we can rewrite as 0.5⟪w⟫^2

To find the optimal value of w:

(w, b): min(0.5⟪w⟫^2) + Ci x Summation of Z

where C = number of errors to be considered
where Summation of Z: total error

C is basically a regularization parameter that will allow the model to accept a certain number of outliers. As you can see from the image below, you can see 2 points on the wrong side of the hyperplane. However, the model will not try to tweak itself to perfectly divide the points depending on the value of C. This is imperative as it can prevent your model from overfitting.

For Z you can consider it as the distance between the lines created by the support vectors (w*x-b=1 and w*x-b=-1). As you can see from the image below the magnitude of vector Ƹ's would be the value of Z.

Sam Jang's Blog

Search This Blog