There are many blog postings out there about linear regression that give a better understanding of some concepts. There are also some advanced text books that cover the model in deep detail (sometimes, indistinct). The goal here is to strike a balance between the two, including non-technical intuitions. Anyone interested in getting a better than average overview of what linear regression is? then here it is!!! It would also be useful to anyone interested in getting into Machine Learning/Data Science. So, without any delay let's get started.
Linear Regression, yeah you heard it right!!! Not the "equation". I know it's the very first thing that comes to our mind but in ML(Machine Learning) it is also related to mathematics.
So, let's see what is Machine Learning and Linear Regression?
Machine Learning is a sub-area of artificial intelligence which enables IT systems to recognize patterns on the basis of existing algorithms and data sets and to develop adequate solution concepts. Therefore, in Machine Learning, artificial knowledge is generated on the basis of experience. Machine Learning works in a similar way to human learning. For example, if a child is shown images with specific objects on them, they can learn to identify and differentiate between them. Machine Learning works in the same way: Through data input and certain commands, the computer is enabled to "learn" to identify certain objects (persons, objects, etc.) and to distinguish between them. For this purpose, the software is supplied with data and trained. For instance, the programmer can tell the system that a particular object is a cat (="cat") and another object is not a cat (="not cat"). The software receives continuous feedback from the programmer. With each new data set fed into the system, the model is further optimized so that it can clearly distinguish between "cat" and "not cat" in the end.
Linear regression is one of the easiest and most popular Machine Learning algorithms. It is a statistical method that is used for predictive analysis. Linear regression makes predictions for continuous/real or numeric variables such as sales, salary, age, product price, etc.
Linear regression algorithm shows a linear relationship between a dependent variable (Y-axis) and independent variable(X-axis), hence called as linear regression. Since linear regression shows the linear relationship, which means it finds how the value of the dependent variable is changing according to the value of the independent variable.
The linear regression model provides a sloped straight line representing the relationship between the variables. Consider the below image:
Mathematically, we can represent a linear regression as:
y= a0+a1x+ ε
Y= Dependent Variable (Target Variable) X= Independent Variable (predictor Variable) a0= intercept of the line (Gives an additional degree of freedom) a1 = Linear regression coefficient (scale factor to each input value). ε = random error
The values for x and y variables are training datasets for Linear Regression model representation.
A linear line showing the relationship between the dependent and independent variables is called a Regression Line. A regression line can show 2 types of relationship:
Positive Linear Relationship: If the dependent variable increases on the Y-axis and independent variable increases on X-axis, then such a relationship is termed as a Positive linear relationship.
Negative Linear Relationship: If the dependent variable decreases on the Y-axis and independent variable increases on the X-axis, then such a relationship is called a negative linear relationship.
When working with linear regression, our main goal is to find the best fit line that means the error between predicted values and actual values should be minimized. The best fit line will have the least error.
So, let's take a look at the below example below for better understanding:
We have to examine the relationship between the age and price for used mobile phones sold.
Here is the table of the data:
When the phone is used for 0.6yrs the selling price was 9500
When the phone is used for 1yr the selling price was 6800
When the phone is used for 2yrs the selling price was 4700
When the phone is used for 4yrs the selling price was 3500
When the phone is used for 6yrs the selling price was 2500
Now, we see that we have a negative relationship between the Price(Y-axis) and Mobile age(X-axis) – as mobile age increases, price decreases.
Let’s use the data from the table and create our Scatter plot and linear regression line using http://endmemo.com/statistics/lr.php:
In the above mentioned graph the X-axis(horizontal) has independent variables which is the mobile age. The Y-axis(vertical) has dependent variables which is price.
So, earlier the phone is sold better the selling price you get and the two are inversely correlated. This correlation is displayed by a blue line in the graph, which is called the the best fit line because it shows the best relationship between the scattered plots.
The intercept we got is 8410.054988 and the slope is -1106.637863. We can conclude that the average mobile price decreases ₹ 1106.63 for each year the phone increases in age. So, the best fit line can be determined as y = -1106.637863x + 8410.054988.
Using this Linear Regression it has become extremely helpful for the business areas, digital customer experience areas and many many other areas in determining the sales, production and supply.
I hope this will help you get a clear cut understanding of what is machine learning and linear regression from a beginner point of view.
~ ~ ~