Introduction to Machine Learning Algorithms: Simple Linear Regression (for beginners)

When learning about the linear regression model, the first thing that comes to mind is a linear equation. So it does not matter If you are from a computer science background or not, we all have some understanding of this mathematical concept, as anybody who has passed out middle school is capable of understanding the simple linear regression model.


To understand the linear regression, let us first explore the meaning of regression. By definition, regression is a measure of the relation between the mean value of one variable and the corresponding values of other variables. It is a method of modeling the relationship between the dependent variable and independent variables. The most common form of regression analysis is Simple linear regression.


Simple linear regression is a linear regression model with a single dependent variable(y) and one independent variable(x). The adjective simple refers to the fact that the outcome variable is related to a single predictor.




The best-fitting line as depicted in the above graph by the red line is computed based on the simple linear equation :

y = a + bx


There are a few concepts to unpack here:

  • Dependent Variable(y)

  • Independent Variable(x)

  • Intercept(a)

  • Coefficient(b)

Let us take a simple example of the monthly household expenditure and monthly income and try to model it based on the simple linear regression analysis. This would help us to predict the savings in the future.

First, you work on some data.

It is a common practice in a household to log the monthly income and monthly expenditure, somewhere in the computer's spreadsheet, I mean who doesn't.



This data might at this point just looks simpler, and of no much use except to control your expenditure if it rises above your income.

But to plan for the future, it will be very useful data if you apply the linear regression model to it.


When we plot this data the following graph shows up.




Now, as we progress, we can see there is some connection between these two parameters. Obviously, you knew it already but with the accurate data and graph you are sure of the connection and can answer some questions like,


How much money can be set aside for a major expense like en expensive car or your child's college tuition fees in the next 5 years and by how much you need to control your expenses to meet your needs?


In order to answer this question, you'll use the data you've been collecting so far, and use it to predict how much you are going to spend as the income changes. The idea is that you can make estimated guesses about the future based on data from the past — the data points you've been laboriously logging.

You end up with a mathematical model that describes the relationship between income earned and expenses done.

Once that model is defined, you can provide it with new information — how much will you spend in the next five years — and the model will predict how much money you're going to need to save to meet a particular expense in the near future.



The graph that we plotted above would define a relationship between the parameters like this.


This relationship is a simple linear regression and line of best fit is defined by:

y = 0.599757x - 735.368087

48 views0 comments
 

© Numpy Ninja.