Skip to main content

Linear Regression

                                            Linear Regression


As you can see from the image above, a line of best fit is drawn on a scatter plot. The line of best fit is essentially the Linear Regression model and the blue data points is the training data.  

What we need to do in order to implement the linear regression model to the data points/training data is to calculate the line of best fit.

Linear Regression Model:
output = weight * input + b   (where "weight" and "input" are vectors)

When looking at how the linear regression model works, you need to have an understanding of what the terms "weight" and "bias" mean. You can think of the "weight" vector and "bias" as terms that optimize the line to be a line of best fit on the training data points (like a gradient the y-intercept in a linear function).      

Loss Function: Mean Squared Error 

In order to calculate a line of best fit, we will need to find the weight and the bias of the model. 

Before we get into an optimization algorithm to calculate the weight and the bias of our model. We will need to take a look at what a loss function is. Loss function or cost function is a function that measures the performance of your model; it's able to calculate how inaccurate your model is. 

This is a loss function called "Mean Squared Error"





This function squares the difference between the target output and the output obtained from the current model. It sums up the output for each of the data points and calculates the average loss. 

Gradient Decent:

This is the algorithm that is used to help find the optimal weight and bias of the model to the training data points.








As you can see from the image above, we can see how the cost in the y-axis changes with respect to the "weight" on the x-axis.

Gradient Decent works by finding the derivative of the cost function with respect to the current weight or bias.


EQUATION: Where theta can be replaced with the weight and the bias of the model.

Theta(new weight or bias)  = Theta(current weight or bias) - (leaning rate) x (derivative of cost function with respect to weight or bias)

This is how the gradient descent algorithm is applied to optimize the weight and the bias.

Learning Rate: 
The learning rate is a tuning hyperparameter that controls how rapidly your weight or bias gets corrected. 
If your learning rate is too big, it would be taking large steps each time you try and optimize it. As you can see from the image above. It wouldn't be very effective to get to the local minimum(point with lowest cost). On the other hand, if your learning rate is too small, it would be very time consuming as it takes small incremental steps to reach the local minimum. Therefore, it's important to make sure that you have the just right learning rate to be efficient.  
Also, using a small learning rate could cause it to get stuck in the local minimum. preventing the model from getting the optimal weight and bias to minimize the 'error' as much as possible.


Comments

Popular posts from this blog

Overfitting vs Underfitting

Overfitting vs Underfitting Hello! In this post, we will be exploring the concepts of Overfitting and Underfitting. Overfitting: Overfitting is a modelling error that occurs when the model fits the training data too well.  As you can see above, the overfitted model is fit almost perfectly to the training data. The overall cost of the model to the training data be near 0, however, the accuracy of this model would be poor when used on testing data. Underfitting: Underfitting is when the model fits to the training data too simply; when the model isn't complex enough to adequately understand the trend/pattern of the training data.   As you can see from the image above, the model is just a linear line and does not fit to the training data very well; the model does not accurately understand the trend of the data and is fit too simply.

Support Vector Machine

Support Vector Machine How the algorithm works: As you can see from the image above, SVM works by dividing different classes with a hyperplane. For the explanation of how this algorithm works, I will be using a 2D graph to simplify how this essentially works. ← 2D case SVM works by finding the optimal weights and bias that separates two different classes. The linear line (when looking at 2D graph) has the equation: w*x - b =0 . When w*x - b  ≥  0 it would predict class 1 and if w*x -b < 0 it would predict class 2. What the SVM model tries to do is to maximize the margin between the support vectors(blue and green points on w*x - b =1 and w*x -b = -1; points that are on the boundary). Hyperplane Definition w*x - b  ≥ +1 ( y i  = +1) {where +1 = class 1} w*x - b  ≤ -1 ( y i  = -1) {where -1 = class 2} The two could be combined to form:  y i   ( w*x - b )  ≥ 1 Finding the Separation of the Margin: We k...

Classification vs Regression

Classification vs Regression Classification: Classification algorithms attempt to predict a discrete class label. For example, a problem where you are trying to build a model(or algorithm) to interpret whether an image is a dog or a cat would be a classification problem.    Some Common Classification Machine Learning Algorithms/models : - K Nearest Neighbors - Support Vector Machine - Logistic Regression - Random Forest Classifier - Decision Trees - Neural Networks Regression: On the other hand, regression algorithms attempt to predict a continuous quantity. An example of a regression problem would be when you're trying to predict the prices of homes, where the prediction could be an integer, fraction, or decimal number.  Some Common Regression Machine Learning Algorithms/models: - Linear Regressor - Polynomial Regressor - Lasso Regressor - Ridge Regressor - Elastic Net Regressor - Support Vector Regressor - Regression Trees - Neural Networks ...