How LightGBM algorithm works

LightGBM is a gradient boosting ensemble method that is used by the Train Using AutoML tool and is based on decision trees. As with other decision tree-based methods, LightGBM can be used for both classification and regression. LightGBM is optimized for high performance with distributed systems.

LightGBM creates decision trees that grow leaf wise, which means that given a condition, only a single leaf is split, depending on the gain. Leaf-wise trees can sometimes overfit especially with smaller datasets. Limiting the tree depth can help to avoid overfitting.

LightGBM uses a histogram-based method in which data is bucketed into bins using a histogram of the distribution. The bins, instead of each data point, are used to iterate, calculate the gain, and split the data. This method can be optimized for a sparse dataset as well. Another characteristic of LightGBM is exclusive feature bundling in which the algorithm combines exclusive features to reduce dimensionality, making it faster and more efficient.

Gradient-based One Side Sampling (GOSS) is used for sampling the dataset in LightGBM. GOSS weights data points with larger gradients higher while calculating the gain. In this method, instances that have not been used well for training contribute more. Data points with smaller gradients are randomly removed and some are retained to maintain accuracy. This method is typically better than random sampling given the same sampling rate.

Additional Resources

Ke, Guolin, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. "Lightgbm: A highly efficient gradient boosting decision tree." Advances in neural information processing systems 30 (2017).

LightGBM documentation

Related topics


In this topic
  1. Additional Resources