KMLP: A Scalable Hybrid Architecture for Web-Scale Tabular Data Modeling
Mingming Zhang, Pengfei Shi, Zhiqing Xiao, Feng Zhao, Guandong Sun, Yulin Kang, Ruizhe Gao, Ningtao Wang, Xing Fu, Weiqiang Wang, Junbo Zhao

TL;DR
KMLP is a scalable hybrid deep learning architecture designed for web-scale tabular data, combining a feature-specific non-linear transformation front-end with a high-order interaction backbone, achieving state-of-the-art results.
Contribution
The paper introduces KMLP, a novel hybrid architecture that effectively models complex large-scale tabular data, outperforming traditional methods and demonstrating scalability.
Findings
KMLP outperforms GBDTs on large-scale benchmarks.
Advantages of KMLP increase with data size.
Validated on industrial datasets with billions of samples.
Abstract
Predictive modeling on web-scale tabular data with billions of instances and hundreds of heterogeneous numerical features faces significant scalability challenges. These features exhibit anisotropy, heavy-tailed distributions, and non-stationarity, creating bottlenecks for models like Gradient Boosting Decision Trees and requiring laborious manual feature engineering. We introduce KMLP, a hybrid deep architecture integrating a shallow Kolmogorov-Arnold Network (KAN) front-end with a Gated Multilayer Perceptron (gMLP) backbone. The KAN front-end uses learnable activation functions to automatically model complex non-linear transformations for each feature, while the gMLP backbone captures high-order interactions. Experiments on public benchmarks and an industrial dataset with billions of samples show KMLP achieves state-of-the-art performance, with advantages over baselines like GBDTs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Machine Learning in Healthcare · Explainable Artificial Intelligence (XAI)
