CatBoost: unbiased boosting with categorical features
Liudmila Prokhorenkova, Gleb Gusev, Aleksandr Vorobev, Anna Veronika, Dorogush, Andrey Gulin

TL;DR
CatBoost introduces novel algorithms for gradient boosting that effectively handle categorical features and reduce prediction shift, resulting in superior performance on various datasets.
Contribution
The paper presents two key innovations: ordered boosting and an advanced categorical feature processing algorithm, addressing target leakage in gradient boosting.
Findings
Outperforms other boosting tools in quality across datasets
Effectively reduces prediction shift caused by target leakage
Demonstrates strong empirical results with new algorithms
Abstract
This paper presents the key algorithmic techniques behind CatBoost, a new gradient boosting toolkit. Their combination leads to CatBoost outperforming other publicly available boosting implementations in terms of quality on a variety of datasets. Two critical algorithmic advances introduced in CatBoost are the implementation of ordered boosting, a permutation-driven alternative to the classic algorithm, and an innovative algorithm for processing categorical features. Both techniques were created to fight a prediction shift caused by a special kind of target leakage present in all currently existing implementations of gradient boosting algorithms. In this paper, we provide a detailed analysis of this problem and demonstrate that proposed algorithms solve it effectively, leading to excellent empirical results.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
CatBoost Part 2: Building and Using Trees· youtube
CatBoost Part 1: Ordered Target Encoding· youtube
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Machine Learning and Data Classification · Imbalanced Data Classification Techniques
