Skip to main content

Overfitting vs Underfitting

Overfitting vs Underfitting

Hello! In this post, we will be exploring the concepts of Overfitting and Underfitting.

Overfitting:

Overfitting is a modelling error that occurs when the model fits the training data too well. 





As you can see above, the overfitted model is fit almost perfectly to the training data. The overall cost of the model to the training data be near 0, however, the accuracy of this model would be poor when used on testing data.

Underfitting:

Underfitting is when the model fits to the training data too simply; when the model isn't complex enough to adequately understand the trend/pattern of the training data.  


As you can see from the image above, the model is just a linear line and does not fit to the training data very well; the model does not accurately understand the trend of the data and is fit too simply.

Comments

Popular posts from this blog

Classification vs Regression

Classification vs Regression Classification: Classification algorithms attempt to predict a discrete class label. For example, a problem where you are trying to build a model(or algorithm) to interpret whether an image is a dog or a cat would be a classification problem.    Some Common Classification Machine Learning Algorithms/models : - K Nearest Neighbors - Support Vector Machine - Logistic Regression - Random Forest Classifier - Decision Trees - Neural Networks Regression: On the other hand, regression algorithms attempt to predict a continuous quantity. An example of a regression problem would be when you're trying to predict the prices of homes, where the prediction could be an integer, fraction, or decimal number.  Some Common Regression Machine Learning Algorithms/models: - Linear Regressor - Polynomial Regressor - Lasso Regressor - Ridge Regressor - Elastic Net Regressor - Support Vector Regressor - Regression Trees - Neural Networks ...

K Nearest Neighbors

K Nearest Neighbors Hello! In this post, we will be exploring a very simple classification algorithm called 'K Nearest Neighbors'. How K Nearest Neighbors work: The green point above is the testing data and the blue and red points are training data in different classes. The way KNN works is by taking the Euclidean distance from your test data(green point) to each of your training data and classify your test data by the class of the nearest point to the testing data. Euclidean Distance: The Euclidean distance is basically the distance between two points on the Euclidean space. To put simply, the distance between two points.  Formula : K K: Parameter that takes a certain number of nearest points to use for the classification process.  K basically takes 'k' number of nearest points(training data) and classifies the test data by the class that has the highest vote. For example, if ...