Updated: Sep 11, 2020
As I promised in my first ever blog “What is “Line of best fit” in linear aggression?”, I am back again to explain a commonly used method to find the “Line of best fit” for linear aggression model.
The most popular and common method that regression analysis uses to generate best fitting line is the “Least squares method”.
The least squares method is a statistical procedure to find the best fit for a set of data points by minimizing the sum of the errors or residuals of points from the plotted line.
The fitted regression line enables us to predict the response, Y`, for a given value of X using:
Y’ = bX + A
In the above equation, how do we determine values of the intercept (A) and slope for our regression line (b)? If we were to draw a line through the data points manually, we would try to draw a line that minimizes the errors, overall. But, when we fit a line through data, some of the observed values (Y) will be larger than their predicted value(Y`) (they will fall above the line), and some of the observed values (Y) will be less than their predicted values (Y`) (they'll fall below the line).
This difference in observed and predicted values are called Prediction errors and calculated as:
E = Y – Y`
If we add up all of the errors, the sum will be zero. So how do we measure overall error? Let’s use a simple trick: we square the errors and find a line that minimizes this sum of the squared errors as shown below:
This method, the method of least squares, finds values of the intercept and slope that minimize the sum of the squared errors and thus giving us the best fitting line’s equation.
To illustrate the concept of least squares, let us take a sample data and use 2 lines of best fit equations (Y1 and Y2) to find the best fitting line out of the 2 lines (green and red) plotted below.
Let's see how to find the best line out of the 2 lines (Y1 and Y2) in the above chart! The following two tables illustrate the implementation of the least squares method for the two lines in the chart — the red line and the green line.
We can see from these two tables that calculated value for function Y1 is 0.37 and for function Y2 is 0.41. By now we all know that smaller value means better fitting function, this means that function Y1 is better option for the given data set. We can conclude from the above calculations, that Y1 (Green line) is the best fitting line out of the 2 lines in the chart.
I hope I have explained what least squares method is and how it works in a simple enough way.
Understanding the importance of regression analysis and the advantages of linear regression, can help any business to gain a far greater understanding of the factors involved and their relationship, that can impact its success in the future.
Thanks for reading!