KMLP: A Scalable Hybrid Architecture for Web-Scale Tabular Data Modeling

Mingming Zhang; Pengfei Shi; Zhiqing Xiao; Feng Zhao; Guandong Sun; Yulin Kang; Ruizhe Gao; Ningtao Wang; Xing Fu; Weiqiang Wang; Junbo Zhao

arXiv:2602.22777·cs.LG·February 27, 2026

KMLP: A Scalable Hybrid Architecture for Web-Scale Tabular Data Modeling

Mingming Zhang, Pengfei Shi, Zhiqing Xiao, Feng Zhao, Guandong Sun, Yulin Kang, Ruizhe Gao, Ningtao Wang, Xing Fu, Weiqiang Wang, Junbo Zhao

PDF

Open Access

TL;DR

KMLP is a scalable hybrid deep learning architecture designed for web-scale tabular data, combining a feature-specific non-linear transformation front-end with a high-order interaction backbone, achieving state-of-the-art results.

Contribution

The paper introduces KMLP, a novel hybrid architecture that effectively models complex large-scale tabular data, outperforming traditional methods and demonstrating scalability.

Findings

01

KMLP outperforms GBDTs on large-scale benchmarks.

02

Advantages of KMLP increase with data size.

03

Validated on industrial datasets with billions of samples.

Abstract

Predictive modeling on web-scale tabular data with billions of instances and hundreds of heterogeneous numerical features faces significant scalability challenges. These features exhibit anisotropy, heavy-tailed distributions, and non-stationarity, creating bottlenecks for models like Gradient Boosting Decision Trees and requiring laborious manual feature engineering. We introduce KMLP, a hybrid deep architecture integrating a shallow Kolmogorov-Arnold Network (KAN) front-end with a Gated Multilayer Perceptron (gMLP) backbone. The KAN front-end uses learnable activation functions to automatically model complex non-linear transformations for each feature, while the gMLP backbone captures high-order interactions. Experiments on public benchmarks and an industrial dataset with billions of samples show KMLP achieves state-of-the-art performance, with advantages over baselines like GBDTs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Machine Learning in Healthcare · Explainable Artificial Intelligence (XAI)