Boosting is an ensemble meta-algorithm that is primarily used for reducing bias and variance in supervised learning. It is a process that uses Machine learning algorithm to combine weak learner to form a strong learner to increase the accuracy of the model. Boosting is a sequential process.
In this post I will be explaining you about What boosting is, Types of boosting and Boosting algorithm.
Why is boosting used
Let’s understand it with an example … Let’s say we are given a data set with cat and dog images and we are asked to build a machine learning model that can classify these images into separate classes. We will begin by some identifying rules like,
1.The image with pointy ears will be a cat
2.The image with cat shaped eye will be a cat
3.The image with bigger limb is a dog
4.The image with sharpened claw is a cat and
5.The image with a wide mouth will be a dog.
But with just these rules if we classify the data we may go wrong, let’s say we have a different breed of cat that has a bigger limb it will get identified as a dog. Each of these rules are individually weak learner because they cannot classify a dog and cat image properly. To make sure our prediction is accurate we combine the predictions from each of the weak learner by using majority vote or weighted average.
What is Boosting
Boosting is a process of converting a weak learner into strong leaner. In Boosting the weak learners are sequentially processed during the training and the performance of the model is improved by adding a higher weightage to the data that is previously misclassified. The entire data is feed to the algorithm and the algorithm will make some prediction. If the algorithm predicts certain misclassified data then we will add weightage to the misclassified data, so that it gets lot more attention. We keep doing this process until all the data are classified properly.
The basic principle of boosting algorithm is to generate multiple weak learner and combine it to make a strong learner. These weak learners are generated by base algorithm of different distribution of the data set. Usually, the base learners are decision tree. These base learners generate weak rule for each iteration so after multiple iteration the weak learners are combined to form a strong learner that will predict more accurate outcome.
Let me explain in step wise…
Step 1: The base algorithm reads the data and assigns equal weight for each sample.
Step 2: If there is any prediction error caused by first base learning algorithm, then we pay higher attention to observations having prediction error. Then, we apply the next base learning algorithm.
Step 3: Repeat step 2 until all the data are classified properly.
Types of Boosting
1. AdaBoost (Adaptive Boosting)
2. Gradient Tree Boosting
Adaboost combines several weak learners into single strong learn. There are few steps which adaboost algorithm will follow. It will equally assign equal weightage to all your data initially and then you draw out a decision stump for a single feature. The results we get from a decision stump are analyzed and if there is any data that’s misclassified then those data are assigned higher weights. After this a new decision stump is drawn considering the higher weights and the process goes on until all data is classified properly. A Decision stump is nothing but a single root node with two leaf nodes. AdaBoost is used on both Classification and regression problems.
In above image we can clearly see that in box 1 all the samples are weighted equally but three +(plus) signs are wrongly predicted in (-). These (+) signs alone gets more weightage in box 2 (second learner) and the process will continues until all the (+) and (-) are properly classified like box 4.
Example code snippet from kaggle:
from sklearn.ensemble import AdaBoostClassifier model=AdaBoostClassifier(DecisionTreeClassifier(max_depth=2, n_estimators=50,learning_rate=0.1)) model.fit(train_X,train_y) prediction=model.predict(test_X) print(‘The accuracy of the AdaBoostClassifier is’,metrics.accuracy_score(prediction,test_y))
You can tune the model with various parameters to optimize the performance, I’ve mentioned some of the most important parameters.
1.Base_estimator- It helps to specify the ML algorithm
2.n_estimators-The maximum number of estimator at which boosting is terminated. In case of perfect fit, the learning procedure is stopped early
3.learning_rate-Controls the contribution of weak learners in the final combination. There is a trade-off between learning_rate and n_estimators.
In Gradient Boosting, the base learners are generated sequentially in a way that the present base learner is more effective than the previous one. Each new learner gradually minimizes the loss function (y=ax+b+e, e is the error term) of the whole system using Gradient Descent method. The big difference between AdaBoosting and Gradient Boosting is that in Gradient Boosting the weightage is not added to the misclassified data of the previous model but rather optimize the loss function of it. Gradient Boosting has three components they are
1. Loss Function- Error that needs to be reduced
2. Weak learners- For computing predictions and forming strong learner.
3. Additional model- which will regularize the loss function.
In Python Sklearn library we use Gradient Tree Boosting or Gradient Boosted Decision Tree (GBDT). It is a generalization of boosting to arbitrary differentiable loss functions. The module sklearn.ensemble provides methods for both classification and regression via gradient boosted decision trees.
from sklearn.ensemble import GradientBoostingClassifier #for Classification from sklearn.ensemble import GradientBoostingRegressor #for Regression
model=GradientBoostingClassifier(max_depth=1, n_estimators=100,learning_rate=1.0) model.fit(train_X,train_y) prediction=model.predict(test_X)
1.n_estimators-The maximum number of estimators at which boosting is terminated. In case of perfect fit, the learning procedure is stopped early.
2.learning_rate-Controls the contribution of weak learners in the final combination. There is a trade-off between learning_rate and n_estimators.
3.max_depth-maximum depth of individual regression estimators.
XGBoost stands for eXtreme Gradient Boosting. It is an advanced version of Gradient Boosting method that is designed to focus computational speed and model efficiency. The reason why XGBoosting was introduced because the Gradient Boosting algorithm was computing the output very slow because of the sequential model training. XGBoost mainly focuses on the computational speed and the model efficiency. In order to that it has couple of features which supports
Parallelization of tree construction using all of your CPU cores during training.
Distributed Computing for training very large models using a cluster of machines.
Out-of-Core Computing for very large datasets that don’t fit into memory.
Cache Optimization of data structures and algorithm to make best use of hardware.
Example code snippet from kaggle
from xgboost.sklearn import XGBClassifier from xgboost.sklearn import XGBRegressor
model = XGBClassifier() model.fit(train_x, train_y) prediction=model.predict(test_x)
I hope now you have an idea on how different types of boosting algorithm works in Machine learning.
Thanks for reading 😊