Ensemble machine learning is a widely used technique. Different bagging and boosting machine learning algorithms have proven to be effective ways of quickly training machine learning algorithms.
The benefit of using an ensemble machine learning algorithm is that you can take advantage of multiple hypotheses to understand the most effective solution to your problem.
The standard base learner generates multiple hypotheses and tested to find the most optimized solution.
Different techniques used in Ensemble Machine learning
There are three techniques used in ensemble machine learning. These techniques are stacking, bagging, and boosting machine learning.
The next section describes and reviews different techniques.
Stacking Ensemble Learning
I have included stacking ensemble learning in this overview as it is an essential technique for gaining a deeper understanding of a dataset.
However, I will only touch upon this technique today and describe the process to introduce it. The reason for this is because stacking ensemble learning involves stacking multiple algorithmic methods such as different classification and regression algorithms to improve results.
As you can imagine stacking the algorithms generally provides a much better result than in a single algorithm. You can learn more about it here.
As a general rule, stacking ensemble learning is also more complex to implement.
For the remainder of the tutorial, we will focus on how you can get the benefits of ensemble learning from implementing one algorithm through bagging or boosting machine learning.
Bagging ensemble learning algorithms
Bagging machine learning is also known as bootstrap aggregation learning.
To understand how bagging works, you first need to understand the statistical principle of bootstrap.
Bootstrap is used to estimate the statistical qualities of a dataset. The bootstrap error takes multiple random samples of the data and determines the mean.
By combining multiple estimations, the bootstrap process is then able to make a better overall prediction of the global mean of the dataset as well as other metrics such as standard deviation.
How is bootstrap used in bagging machine learning?
In bagging machine learning, the bootstrap method is applied to aggregate the predictions across multiple samples. This process of aggregation helps to reduce variance in the algorithm.
There are two types of bagging estimators used in ensemble machine learning:
- Bagging meta-estimator
- Random forest decision trees
You can read more about implementing random forest classification algorithms here.
The Bagging meta-estimator
Bagging meta-estimator applies in both classification (BaggingClassifier) and regression (BaggingRegressor) problems. It follows the typical bagging technique to make predictions.
Following are the steps for the bagging meta-estimator algorithm: (source)
- Create random samples (using bootstrap)
- Make sure each subset of data includes all features.
- A user-specified base estimator fits on each of these smaller sets.
- Predictions from each model are combined to get the final result.
More commonly, however, you will see people using Random Forest decision trees for bagging machine learning.
The random forest algorithm
The random forest algorithm works by creating multiple decision trees at random points within the experimental space.
These decision trees are designed to go deep so that they do not overfit the data.
Below is an image showing how random forest bagging algorithms implement.
Boosting machine learning algorithms
The general principle of boosting machine learning is that it takes a weaker learner and combines it with a strong rule to create a stronger learner. Substantially it is promoting the algorithm.
Boosting machine learning algorithms can enhance the features of the input data and use them to make better overall predictions.
There are many different boosting algorithms. Some of the algorithms are listed below:
- AdaBoost: Adaptive boosting assigns weights to incorrect predictions so that they can be corrected
- GBM: Gradient boosting uses boosting to combine weak and strong learners
- XGBM: A more powerful, and prevalent version of Gradient Boosting
- Light GBM: The best for large datasets. Works at a leaf level (part of the decision trees) as opposed to a level-wise approach to optimize and boost the learners
- CatBoost: A boosting machine learning algorithm that automatically pre-processes categorical data
XGBoost (Extreme Gradient Boosting) is the most popular boosting machine learning algorithm. XGBoost can use a variety of regularization in addition to gradient boosting to prevent overfitting and improve the performance of the algorithm.
Random Forest Bagging vs. XGBoost Boosting Machine Learning
Below is a table outlining the pros and cons of Random Forest and XGBoost ensemble learning.
|Random Forest||Easy to tune|
Doesn’t overfit easily
Can manage missing data well
Handle large numbers of inputs
|Large numbers of trees make it slow in product-ion|
Not great with highly categorical data – easy to bias
|XGBoost||Can solve most objective functions including ranking and position regression||Longer training time|
Hard to tune
Sensitive to overfitting
You don’t have to choose one or the other algorithm when you are experimenting. Try both random forest bagging and XGBoost boosting machine learning. This Kaggle kernel (link) is a great example of how to experiment with both.
Step by step implementations of bagging and boosting machine learning
Below is an example of what the code looks like in python.
In addition to the class, you need to define the number of estimators or ‘trees’. To determine the correct number, play around with a few different values and see which gives you the best result.
Implementing XGBoost is actually pretty simple as you will see from the code below:
You can optimize the algorithm by tuning the hyperparameters with the GridSearch method. To learn more about Grid Search check out this article on hyperparameter tuning.
As you can see, implementing the top bagging and boosting machine learning algorithms is relatively simple.
Once you understand how these two types of algorithm work it is easy to understand how to use them to improve your machine learning predictions.