# Regression Algorithm Part 4: Support Vector Regression Using R Language

# What is Support Vector Regression?

**Support Vector Machine** is a supervised **machine** learning algorithm that can be used for **regression** or **classification** problems. It can solve linear and non-linear problems and **work** well for many practical problems. It uses a technique called the **kernel trick** to transform your data and then based on these transformations it finds an optimal boundary between the possible outputs.

Let’s understand Support Vector Regression using the **Position_Salaries** data set which is available on Kaggle. This data set consists of a list of positions in a company along with the band levels and their associated salary. The data set includes columns for Position with values ranging from Business Analyst, Junior Consultant to CEO, Level ranging from 1–10, and finally the Salary associated with each position ranging from $45000 to $1000000.

# Required R package

First, you need to install the e1071 and ggplot2 package and load the e1071 and ggplot2 library then after you can able to perform the following operations. So let’s start to implement our non-linear regression model.

**Import libraries**

```
install.packages('e1071')
install.packages('ggplot2')
library(e1071)
library(ggplot2)
```

*Note: If you use R studio then packages need to be installed only once.*

**Importing the dataset**

```
dataset <- read.csv('../input/position-salaries/Position_Salaries.csv')
dataset <- dataset[2:3]
dim(dataset)
```

The read.csv() function is used to read the csv file and the dim() function is used to know the csv file contains how many rows and columns. In this data set, the column Position and Level have the same meaning therefore, we choose the Level column. Also, the data set is very small so don’t split into training and test set.

The problem statement is that the candidate with level 6.5 had a previous salary of 160000. In order to hire the candidate for a new role, the company would like to confirm if he is being honest about his last salary so it can make a hiring decision. To do this, we will make use of ** Support Vector Regression** to predict the accurate salary of the employee.

**Apply Support Vector Regression to the****data set**

`regressor <- svm(formula <- Salary ~ ., data <- dataset, type <- 'eps-regression', kernel <- 'radial')`

The svm() function used to create a Support Vector Regression model. If you look at the data set, we have one dependent variable salary and one independent variable Level. Therefore, the notation formula <- Salary ~ . means that the salary is proportional to Level. The dot represents all the independent variables. Now, the second argument takes the data set on which you want to train your regression model. The third argument is the most important because this argument type will specify if you’re making an SVM model which is used for regression or classification. Here, we’re building a non-linear regression model so we will choose the **eps-regression** type. The final argument is, add the kernel argument. If you don’t choose any kernel argument, the Gaussian kernel selected by default.

**Predicting a new result with Support Vector Regression**

`y_pred <- predict(regressor, data.frame(Level = 6.5))`

This code predicts the salary associated with 6.5 level according to a Support Vector Regression Model and it gives us the close prediction to 160 k so it’s a pretty good prediction.

**Visualize the Support Vector Regression results**

```
ggplot() +
geom_point(aes(x <- dataset$Level, y <- dataset$Salary), colour = 'red') +
geom_line(aes(x <- dataset$Level, y <- predict(regressor, dataset)), colour = 'blue') +
ggtitle('Support Vector Regression') +
xlab('Level') +
ylab('Salary')
```

The Support Vector Regression model represents the blue curve which fits good on the data because the observation points are very close to the real observation except for the outlier.

The code is available on my **GitHub** account.

The previous part of the series **part1**, **part2** and **part3** covered the Linear Regression, Multiple Linear Regression and Polynomial Linear Regression.

If you like the blog or found it helpful please leave a clap!

**Thank you.**