Scaling Up Diffusion and Flow-based XGBoost Models
Jesse C. Cresswell, Taewoo Kim

TL;DR
This paper demonstrates that with improved implementation, diffusion and flow-based XGBoost models can be scaled to much larger datasets and models, significantly enhancing performance for scientific data generation tasks.
Contribution
The authors provide an efficient implementation of XGBoost for large-scale diffusion and flow models, enabling scaling to datasets 370 times larger and improving generative performance.
Findings
Scalable implementation allows handling datasets 370x larger.
Improved models show better performance on benchmark tasks.
Algorithmic enhancements like multi-output trees benefit resource use and accuracy.
Abstract
Novel machine learning methods for tabular data generation are often developed on small datasets which do not match the scale required for scientific applications. We investigate a recent proposal to use XGBoost as the function approximator in diffusion and flow-matching models on tabular data, which proved to be extremely memory intensive, even on tiny datasets. In this work, we conduct a critical analysis of the existing implementation from an engineering perspective, and show that these limitations are not fundamental to the method; with better implementation it can be scaled to datasets 370x larger than previously used. Our efficient implementation also unlocks scaling models to much larger sizes which we show directly leads to improved performance on benchmark tasks. We also propose algorithmic improvements that can further benefit resource usage and model performance, including…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIterative Learning Control Systems · Model Reduction and Neural Networks · Real-time simulation and control systems
MethodsDiffusion
