AIML.com

Machine Learning Resources

What is XGBoost? How does it improve upon standard GBM?

Bookmark this question

Related Questions:
– What is Gradient Boosting (GBM)?

– What are the key hyperparameters for a GBM model?
– What is Regularization?

XGBoost, which stands for Extreme Gradient Boosting, is a modern open-source implementation of Gradient Boosting Machine that works largely the same as the standard GBM. Like regular GBM, it fits to the residuals of previous trees and then predicts on a new observation by using a linear combination of the trees weighted by the learning rate. XGBoost provides parallel tree boosting and has become a popular machine learning algorithm for its scalability, accuracy, and speed.

Developed by Tianqi Chen, the XGBoost library implements the GBM algorithm. XGBoost improves upon the standard GBM framework by incorporating several additional features and optimizations, including:

  1. Regularization techniques such as L1 (Lasso) and L2 (Ridge) regularization to prevent overfitting and improve model generalization. This is done through the use of a regularization term in the objective function that penalizes large model weights.
  2. Tree pruning to remove splits that do not contribute to the overall performance of the tree. This reduces the complexity of the tree and can lead to faster and more accurate predictions.
  3. Parallel processing: XGBoost can use parallel processing to speed up training on large datasets by utilizing multiple CPU cores.
  4. Handling missing data: XGBoost has built-in capabilities for handling missing data, allowing it to make predictions even when some features are missing.
  5. Built-in cross-validation capabilities allowing for more accurate model evaluation and parameter tuning.

The following picture compares XGBoost with other GBM algorithms:

Comparing runtime for different Gradient Boosting Algorithms
Comparing runtime for different Gradient Boosting Algorithms (Source: Video snippet by Tianqi Chen in KDD 2016)

In the recent years, XGBoost library has become increasingly popular as it helped several teams win Kaggle structured data competition. XGBoost algorithm has been implemented in multiple coding language including R, Python, Scala, Julia, Perl.

Following are some of the interesting resources on XGBoost:

Video Explanation

Talk by Tianqi Chen, the creator of the XGBoost Library, at the LA Data Science group (June 2016)

Leave your Comments and Suggestions below:

Please Login or Sign Up to leave a comment

Partner Ad  

Find out all the ways
that you can