Skip to main content

Lasso and Ridge Regression

Ridge and Lasso Regression


Hello! Today we will be exploring two different regression algorithms called Ridge Regression and Lasso Regression which are similar to how Linear Regression works. 

Before we get into how Lasso and Ridge regression models work, you need to understand what is meant by the term 'overfitting'.

Overfitting:

Overfitting refers to training the model too well by the training data. 




For example, the indicator that your model is overfitting could be when the total cost of your model by the training data is zero and the cost for your test data is huge. As a result, this could negatively affect your model as it's unable to predict accurately for your testing data.

I will be exploring more about the concept of overfitting and underfitting in future posts

Ridge Regression:

Like I said at the beginning of this post, the Lasso regression model is very similar to how the Linear regression model works. As some of you may already know, the cost function that is used in Linear regression is Mean Squared Error(MSE). However, the cost function for Ridge regression is a little different.

RSS(Residual Sum of Squares): 



Ridge Regression Cost:



In Ridge Regression, we add lambda times (sum of the weights squared) to the RSS cost of the model. This is what prevents the model from overfitting. This cost function prevents the total cost from being zero; when the RSS cost equals zero it indicates that the model has trained perfectly by the training data. However, by adding lambda times (sum of the weights squared) to the RSS, it prevents the model from being perfectly fit to the training data. Lambda times (sum of weights squared) will be a very small value that adds to the RSS cost, this prevents the model from thinking that it's perfectly fit and causes it to shift and modify the weights. 

Lambda: Value greater than 0; the most optimal value can be found using cross-validation. 

Lasso Regression:

Lasso Regression works in a similar fashion to Ridge regression, however, it adds lambda times (sum of the absolute values of the weights) to the RSS.





LambdaValue greater than 0; the most optimal value can be found using cross-validation.

Comments

Popular posts from this blog

Overfitting vs Underfitting

Overfitting vs Underfitting Hello! In this post, we will be exploring the concepts of Overfitting and Underfitting. Overfitting: Overfitting is a modelling error that occurs when the model fits the training data too well.  As you can see above, the overfitted model is fit almost perfectly to the training data. The overall cost of the model to the training data be near 0, however, the accuracy of this model would be poor when used on testing data. Underfitting: Underfitting is when the model fits to the training data too simply; when the model isn't complex enough to adequately understand the trend/pattern of the training data.   As you can see from the image above, the model is just a linear line and does not fit to the training data very well; the model does not accurately understand the trend of the data and is fit too simply.

Support Vector Machine

Support Vector Machine How the algorithm works: As you can see from the image above, SVM works by dividing different classes with a hyperplane. For the explanation of how this algorithm works, I will be using a 2D graph to simplify how this essentially works. ← 2D case SVM works by finding the optimal weights and bias that separates two different classes. The linear line (when looking at 2D graph) has the equation: w*x - b =0 . When w*x - b  ≥  0 it would predict class 1 and if w*x -b < 0 it would predict class 2. What the SVM model tries to do is to maximize the margin between the support vectors(blue and green points on w*x - b =1 and w*x -b = -1; points that are on the boundary). Hyperplane Definition w*x - b  ≥ +1 ( y i  = +1) {where +1 = class 1} w*x - b  ≤ -1 ( y i  = -1) {where -1 = class 2} The two could be combined to form:  y i   ( w*x - b )  ≥ 1 Finding the Separation of the Margin: We k...

Classification vs Regression

Classification vs Regression Classification: Classification algorithms attempt to predict a discrete class label. For example, a problem where you are trying to build a model(or algorithm) to interpret whether an image is a dog or a cat would be a classification problem.    Some Common Classification Machine Learning Algorithms/models : - K Nearest Neighbors - Support Vector Machine - Logistic Regression - Random Forest Classifier - Decision Trees - Neural Networks Regression: On the other hand, regression algorithms attempt to predict a continuous quantity. An example of a regression problem would be when you're trying to predict the prices of homes, where the prediction could be an integer, fraction, or decimal number.  Some Common Regression Machine Learning Algorithms/models: - Linear Regressor - Polynomial Regressor - Lasso Regressor - Ridge Regressor - Elastic Net Regressor - Support Vector Regressor - Regression Trees - Neural Networks ...