I am trying to put together a quick coding guide for SVM and K-SVM using simple dateset and I will compare the accuracy using confusion matrix between and accuracy score both model.
Lets understand the data set....
This below data set represent whether an individual purchased a car or not based on age and salary. As part of explanation I will create training and testing data set out of below data set. The training datasets will be used to train the model and testing data will be used to test the model.
I will follow below steps in order to arrive prediction for the above data.
SVM & K-SVM :
1.Importing the libraries
2.Importing the datasets
3.Splitting the datasets into the Training set and Test set
4.Feature Scaling
5.Training the Logistic Regression model on the Training set
6.Predicting a new result
7.Predicting the Test set results
8.Making the Confusion Matrix
Login in kaggle using your login credential and follow the steps.
Details :-
1.Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
2.Importing the datasets
dataset = pd.read_csv('/content/sample_data/Social_Network_Ads.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values
3.Splitting the datasets into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 1)
4.Feature Scaling :
In this I am standardizing the feature variables.
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train= sc.fit_transform(X_train)
X_test = sc.transform(X_test)
5.Training the SVN Regression model on the Training set
from sklearn.svm import SVC
classifier = SVC(kernel = 'XX',random_state = 0)
classifier.fit(X_train, y_train)
Note : Please replace XX to linear for SVM or rbf for K-SVM.
6.Predicting a result
print (classifier.predict(sc.transform([[30,87000]])))
Test dateset:
Output :
The below output is matching with test dateset.
Kernel(linear) :- [0]
Kernel(rbf) :- [0]
7.Predicting the Test set results
y_pred = classifier.predict(X_test)
print(np.concatenate((y_pred.reshape(len(y_pred),1), y_test.reshape(len(y_test),1)),1))
Output :
The below output shows the comparison between actual result(y_test) vs predicted
result(y_pred) out of test datasets.
Kernel(linear):
[[0 0] [0 0] [1 1] [1 1] [0 0] [0 0] [0 0] [1 1] [0 0] [1 0] [0 0] [0 0] [0 0] [1 1] [1 1] [1 1] [1 1] [0 0] [0 0] [1 1] [0 0] [1 1] [1 1] [0 0] [0 1] [0 0] [1 1] [1 0] [1 1] [1 0] [0 0] [0 0] [0 0] [1 1] [0 0] [0 0] [0 0] [0 0] [0 1] [0 0] [1 1] [1 1] [0 0] [0 0] [1 1] [0 1] [0 1] [1 1] [0 0] [1 1] [0 0] [0 0] [1 1] [0 1] [0 1] [0 0] [1 1] [0 0] [1 1] [1 1] [0 0] [0 0] [1 0] [0 0] [0 1] [1 1] [0 0] [0 0] [1 0] [0 0] [1 0] [0 0] [1 1] [0 0] [0 0] [1 1] [0 0] [0 0] [0 0] [0 0]]
Kernel(rbf) :
[[0 0] [0 0] [1 1] [1 1] [1 0] [0 0] [0 0] [1 1] [0 0] [1 0] [0 0] [0 0] [0 0] [1 1] [1 1] [1 1] [1 1] [0 0] [0 0] [1 1] [0 0] [1 1] [1 1] [1 0] [1 1] [0 0] [1 1] [1 0] [1 1] [1 0] [0 0] [0 0] [0 0] [1 1] [0 0] [0 0] [0 0] [0 0] [1 1] [0 0] [1 1] [1 1] [1 0] [0 0] [1 1] [1 1] [1 1] [1 1] [0 0] [1 1] [0 0] [0 0] [0 1] [1 1] [0 1] [0 0] [1 1] [0 0] [1 1] [1 1] [0 0] [0 0] [1 0] [0 0] [1 1] [1 1] [0 0] [0 0] [1 0] [0 0] [1 0] [0 0] [1 1] [0 0] [0 0] [1 1] [0 0] [0 0] [0 0] [0 0]]
8.Making the Confusion Matrix
A confusion matrix is a table ( see below) that is often used to describe the performance of a
classification model on a set of test data for which the true values are known.
Legend :
TP- True Positive
FP- False Positive
FN- False Negative
TN- True Negative
from sklearn.metrics import confusion_matrix, accuracy_score
cm = confusion_matrix(y_test, y_pred)
print(cm)
accuracy_score(y_test, y_pred)
Output:-
Kernel(linear):
[[42 6] [ 7 25]]
accuracy_score : 0.8375
Kernel(rbf):
[[39 9] [ 2 30]]
accuracy_score : 0.8625
Observation :
In linear model 42 came out as true positive and 6 false positive where as 7 came out
as false negative and 25 true negative. So only 13 outputs evaluated incorrectly.
In rbf model 39 came out as true positive and 9 false positive where as 2 came out
as false negative and 30 true negative. So only 11 outputs evaluated incorrectly.
Per observation it is evident that kernel rbf is better model than kernel linear based on confusion matrix and accuracy score.
Please note there is only one difference between rbf and linear model. All I have to change kernel parameter in SVC class and rest of code is same to build model.
I hope you will able to create both model using the blog.Happy reading....