Intra-tree Column Subsampling Hinders XGBoost Learning of Ratio-like Interactions
Mykola Pinchuk

TL;DR
This paper investigates how intra-tree column subsampling in XGBoost impairs the model's ability to learn ratio-like interactions, especially when such features are not explicitly included, leading to reduced performance.
Contribution
It demonstrates that intra-tree column subsampling hampers the learning of ratio-like interactions in gradient boosted trees, highlighting the importance of feature engineering or avoiding subsampling in such cases.
Findings
Intra-tree subsampling reduces performance on ratio-like features.
Including engineered ratio features mitigates the negative effect.
Performance decrease can be as high as 54% in test PR-AUC.
Abstract
Many applied problems contain signal that becomes clear only after combining multiple raw measurements. Ratios and rates are common examples. In gradient boosted trees, this combination is not an explicit operation: the model must synthesize it through coordinated splits on the component features. We study whether intra-tree column subsampling in XGBoost makes that synthesis harder. We use two synthetic data generating processes with cancellation-style structure. In both, two primitive features share a strong nuisance factor, while the target depends on a smaller differential factor. A log ratio cancels the nuisance and isolates the signal. We vary colsample_bylevel and colsample_bynode over s in {0.4, 0.6, 0.8, 0.9}, emphasizing mild subsampling (s >= 0.8). A control feature set includes the engineered ratio, removing the need for synthesis. Across both processes, intra-tree column…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace and Expression Recognition · Neural Networks and Applications · VLSI and FPGA Design Techniques
