Skip to main content

Support Vector Machine

Support Vector Machine

How the algorithm works:








As you can see from the image above, SVM works by dividing different classes with a hyperplane. For the explanation of how this algorithm works, I will be using a 2D graph to simplify how this essentially works.



← 2D case



SVM works by finding the optimal weights and bias that separates two different classes. The linear line (when looking at 2D graph) has the equation: w*x - b =0. When w*x - b  0 it would predict class 1 and if w*x -b < 0 it would predict class 2.

What the SVM model tries to do is to maximize the margin between the support vectors(blue and green points on w*x - b =1 and w*x -b = -1; points that are on the boundary).

Hyperplane Definition
  • w*x - b ≥ +1 (yi = +1) {where +1 = class 1}
  • w*x - b ≤ -1 (yi = -1) {where -1 = class 2}
The two could be combined to form: yi ( w*x - b ) ≥ 1

Finding the Separation of the Margin:
We know that w*x - b = +1 and w*x - b = -1  are the lines going through the support vectors. Let's say that x1 is the support vector for class +1 and x2 the support vector for class -1. Then the two lines would become w*x1 - b = +1 and w* x2- b = -1.

If we considered this a simulation equation problem, then we would get w*(x1 - x2 ) = 2 and if we divide both sides by the magnitude of w to get the total distance between the support vectors we get
(w*(x1 - x2 ))/⟪w⟫ = 2/⟪w⟫ 

Margin:  2/⟪w⟫

As we are trying to maximize the margin, we would need to try and maximize:  2/⟪w⟫

basically minimizing ⟪w⟫, which we can rewrite as 0.5⟪w⟫^2

To find the optimal value of w:

(w, b): min(0.5⟪w⟫^2) + Ci x Summation of Z

where C = number of errors to be considered
where Summation of Z: total error

C is basically a regularization parameter that will allow the model to accept a certain number of outliers. As you can see from the image below, you can see 2 points on the wrong side of the hyperplane. However, the model will not try to tweak itself to perfectly divide the points depending on the value of C. This is imperative as it can prevent your model from overfitting. 

For Z you can consider it as the distance between the lines created by the support vectors (w*x-b=1 and w*x-b=-1). As you can see from the image below the magnitude of  vector Ƹ's would be the value of Z.   








Comments

Popular posts from this blog

Overfitting vs Underfitting

Overfitting vs Underfitting Hello! In this post, we will be exploring the concepts of Overfitting and Underfitting. Overfitting: Overfitting is a modelling error that occurs when the model fits the training data too well.  As you can see above, the overfitted model is fit almost perfectly to the training data. The overall cost of the model to the training data be near 0, however, the accuracy of this model would be poor when used on testing data. Underfitting: Underfitting is when the model fits to the training data too simply; when the model isn't complex enough to adequately understand the trend/pattern of the training data.   As you can see from the image above, the model is just a linear line and does not fit to the training data very well; the model does not accurately understand the trend of the data and is fit too simply.

Linear Regression

                                                       Linear Regression   As you can see from the image above, a line of best fit is drawn on a scatter plot. The line of best fit is essentially the Linear Regression model and the blue data points is the training data.   What we need to do in order to implement the linear regression model to the data points/training data is to calculate the line of best fit. Linear Regression Model : output = weight * input + b   (where "weight" and "input" are vectors) When looking at how the linear regression model works, you need to have an understanding of what the terms "weight" and "bias" mean. You can think of the "weight" vector and "bias" as terms that optimize the line to be a line of best fit on the training data points (like a gradient the y-intercept in ...

Lasso and Ridge Regression

Ridge and Lasso Regression Hello! Today we will be exploring two different regression algorithms called Ridge Regression and Lasso Regression which are similar to how Linear Regression works.  Before we get into how Lasso and Ridge regression models work, you need to understand what is meant by the term 'overfitting'. Overfitting: Overfitting refers to training the model too well by the training data.  For example, the indicator that your model is overfitting could be when the total cost of your model by the training data is zero and the cost for your test data is huge. As a result, this could negatively affect your model as it's unable to predict accurately for your testing data. I will be exploring more about the concept of overfitting and underfitting in future posts Ridge Regression: Like I said at the beginning of this post, the Lasso regression model is very similar to how the Linear regression model works. As some of you may already know...