CatBoost vs. LightGBM vs. XGBoost

Which is the best algorithm?

May 5, 2022

5 min read

Photo by Tingey Injury Law Firm on Unsplash

CatBoost (Category Boosting), LightGBM (Light Gradient Boosted Machine), and XGBoost (eXtreme Gradient Boosting) are all gradient boosting algorithms. Before diving into their similarity and differences in terms of characteristics and performance, we must understand the term ensemble learning and how it relates to gradient boosting.

Ensemble Learning

Ensemble Learning is a technique that combines predictions from multiple models to get a prediction that would be more stable and generalize better. The idea is to average out different models’ individual mistakes to reduce the risk of overfitting while maintaining strong prediction performance.

In regression, overall prediction is typically the mean of individual tree predictions, whereas, in classification, overall prediction is based on a weighted vote with probabilities averaged across all trees, and the class with the highest probability is the final predicted class.

There are two main classes of ensemble learning methods, namely bagging and boosting, although ML (Machine Learning) algorithms can be a combination of both with certain variations.

Bagging method builds models in parallel using a random subset of data (sampling with replacement) and aggregates predictions of all models
Boosting method builds models in sequence using the whole data, with each model improving on the previous model’s error

CatBoost, LightGBM, and XGBoost are all variations of gradient boosting algorithms. Now you’ve understood the difference between bagging and boosting, we can move on to the differences in how the algorithms implement gradient boosting.

Catboost vs. LightGBM vs. XGBoost Characteristics

The table below is a summary of the differences between the three algorithms, read on for the elaboration of the characteristics.

Table 1: Characteristics of CatBoost, LightGBM, and XGBoost - Image by author — Table 1: Characteristics of CatBoost, LightGBM, and XGBoost – Image by author

Tree Symmetry

In CatBoost, symmetric trees, or balanced trees, refer to the splitting condition being consistent across all nodes at the same depth of the tree. LightGBM and XGBoost, on the other hand, results in asymmetric trees, meaning splitting condition for each node across the same depth can differ.

Fig 1: Asymmetric vs. Symmetric Trees - Image by author — Fig 1: Asymmetric vs. Symmetric Trees – Image by author

For symmetric trees, this means that the splitting condition must result in the lowest loss across all nodes of the same depth. Benefits of balanced tree architecture include faster computation and evaluation and control overfitting.

Even though LightGBM and XGBoost are both asymmetric trees, LightGBM grows leaf-wise while XGBoost grows level-wise. To put it simply, we can think of LightGBM as growing the tree selectively, resulting in smaller and faster models compared to XGBoost.

Fig 2: LightGBM (left) vs. XGBoost (right) - Image by author — Fig 2: LightGBM (left) vs. XGBoost (right) – Image by author

Splitting Method

Splitting Method refers to how the splitting condition is determined.

In CatBoost, a greedy method is used such that a list of possible candidates of feature-split pairs are assigned to the leaf as the split and the split that results in the smallest penalty is selected.

In LightGBM, Gradient-based One-Side Sampling (GOSS) keeps all data instances with large gradients and performs random sampling for data instances with small gradients. Gradient refers to the slope of the tangent of the loss function. Data points with larger gradients have higher errors and would be important for finding the optimal split point, while data points with smaller gradients have smaller errors and would be important for keeping accuracy for learned decision trees. This sampling technique results in lesser data instances to train the model and hence faster training time.

In XGBoost, the pre-sorted algorithm considers all feature and sorts them by feature value. After which, a linear scan is done to decide the best split for the feature and feature value that results in the most information gain. The histogram-based algorithm works the same way but instead of considering all feature values, it groups feature values into discrete bins and finds the split point based on the discrete bins instead, which is more efficient than the pre-sorted algorithm although still slower than GOSS.

Type of Boosting

There are variations in how data is selected for training. Ordered boosting refers to the case when each model trains on a subset of data and evaluates another subset of data. Benefits of ordered boosting include increasing robustness to unseen data.

Categorical Columns

The parameters for categorical columns for different algorithms are as follows,

CatBoost: cat_features, one_hot_max_size
LightGBM: categorical_feature
XGBoost: NA

Improving Accuracy, Speed, and Controlling Overfitting

In ensemble learning, averaging the prediction across different models helps with overfitting. However, as with any tree-based algorithm, there is still a possibility of overfitting. Overfitting can be handled in the splitting of the dataset into train, validation, and test set, enabling cross-validation, early stopping, or tree pruning. For the sake of comparing the different algorithms, we will focus on controlling overfitting using model parameters.

Note that to control the complexity of the model, XGBoost uses the parameter max_depth (since it grows level-wise) whereas LightGBM uses the parameter num_leaves (since it grows leaf-wise).

Table 2: Parameters to tune for accuracy, speed, and overfitting - Image by author — Table 2: Parameters to tune for accuracy, speed, and overfitting – Image by author

Performance Comparison

There are various benchmarking on accuracy and speed performed on different datasets. I find it hasty to generalize algorithm performance over a few datasets, especially if overfitting and numerical/categorical variables are not properly accounted for.

However, generally, from the literature, XGBoost and LightGBM yield similar performance, with CatBoost and LightGBM performing much faster than XGBoost, especially for larger datasets.

Hope you have a better understanding of the three most popular types of ML boosting algorithms – CatBoost, LightGBM, and XGBoost which mainly differ structurally. In practice, data scientists usually try different types of ML algorithms against their data – so don’t rule out any algorithm just yet! Besides understandability, performance, and timing considerations in choosing between different algorithms, it is also crucial to finetune the models via hyperparameter tuning and control overfitting via pipeline architecture or hyperparameters.

CatBoost vs. LightGBM vs. XGBoost

Table of Contents

Ensemble Learning

Catboost vs. LightGBM vs. XGBoost Characteristics

Tree Symmetry

Splitting Method

Type of Boosting

Categorical Columns

Improving Accuracy, Speed, and Controlling Overfitting

Performance Comparison

Related Links

CatBoost

LightGBM

XGBoost

Related Articles

Implementing Convolutional Neural Networks in TensorFlow

What Do Large Language Models “Understand”?

How to Forecast Hierarchical Time Series

Hands-on Time Series Anomaly Detection using Autoencoders, with Python

3 AI Use Cases (That Are Not a Chatbot)

Back To Basics, Part Uno: Linear Regression and Cost Function

Must-Know in Statistics: The Bivariate Normal Projection Explained