Bias and Variance Trade-off





For any model to perform well the error needs to be reduced. The correct balance of bias and variance is important for building any machine-learning algorithms and to create accurate results from their models. Bias and variance are used in supervised machine learning, in which an algorithm learns from training data or a sample data set of known quantities. Bias and variance are components of reducible error.


Bias

Bias is basically how far we have predicted a value than the actual value. We can say that the bias is too high if our prediction is far off from the actual prediction. In other words, Bias means error of the training data. A high bias can cause algorithm to miss the relationship between the input and output variables as a result it oversimplifies the model and can lead to underfitting.

Let’s take an example and see what bias is ...

Let’s say we wanted to predict the height of students using the linear regression



In the above picture we can clearly see that the straight line doesn’t have the flexibility to build the true relationship, no matter how well we fit it in the training data. The inability for a machine learning method to find the true relationship is called bias.




Variance

When model does not perform well as it does with the train data set there is a possibility that the model has a variance. In the other words, Variance means error of the test data. Its basically telling you how scattered the predicted values are from the actual value. A high variance in the data set means that the model as trained with lot of noise and irrelevant data thus leading to overfitting.


How does Bias and Variance affect the model



Image endtoend.ai


High Variance and High Bias: Model will be inconsistent and also inaccurate on average.

Low Variance and High Bias: Models are consistent but low on average.

High Variance and Low Bias: Somewhat accurate but inconsistent on average.

Low Variance and Low Bias: It is the ideal one where the model is consistent and accurate on average.


Bias Variance Trade off



Image from datarobot


It is a concept of finding the right balance between the Bias and the Variance so that our model isn’t overfitted nor underfitted. If the model is too simple and has a very few parameters it will suffer from high bias and low variance and on the other hand if the model has large number of parameters then it will have high variance and low bias. The trade off should result with perfectly balanced bias and variance.


Concept of Bias and Variance Mathematically

Let the variance that we are predicting be Y and the other independent variables be X. Let us assume that there is a relationship between these two variables such that

Y=f(x)+e

Where e is the estimated error with the mean value 0 and then we make a classifier using the algorithm linear regression or Support vector machine etc... The expected squared error at a point X will be Y squared plus the variance plus the irreducible error.

Total Error = Bias²+Variance+Irreducible error



Total error

To build any good model we need the right balance between Bias and Variance in a way that it minimizes total error.



Image ResearchGate


Total Error = Bias²+Variance+Irreducible error


A perfect balance of Bias and Variance will never underfit or overfit. It is very essential for any data scientist to understand the concept of bias and variance to reduce the errors and build accurate models.


Thank you for reading 😊

16 views0 comments

Recent Posts

See All

BDD - GHERKIN FOR OCTOPARSE APPLICATION

Octoparse is a web scrapping tool used to extract web data without coding.You can use the free version where you can work up to ten task or paid subscription with more service included. Let's see some

 

© Numpy Ninja.