MaskTab: Scalable Masked Tabular Pretraining with Scaling Laws and Distillation for Industrial Classification
Bo Zheng, Yudong Chen, Zihua Xiong, Shuai Fang, Peidong He, Yang Yang, Sheng Guo

TL;DR
MaskTab introduces a scalable, self-supervised pretraining framework for industrial tabular data, effectively handling missing values and enabling robust, lightweight models for high-stakes decision systems.
Contribution
It presents a novel unified pretraining approach tailored for complex tabular data, incorporating missing value encoding, hybrid training schemes, and model distillation.
Findings
Achieves +5.04% AUC and +8.28% KS improvements on benchmarks.
Distills into lightweight models with +2.55% AUC and +4.85% KS gains.
Enhances robustness to distribution shifts in industrial datasets.
Abstract
Tabular data forms the backbone of high-stakes decision systems in finance, healthcare, and beyond. Yet industrial tabular datasets are inherently difficult: high-dimensional, riddled with missing entries, and rarely labeled at scale. While foundation models have revolutionized vision and language, tabular learning still leans on handcrafted features and lacks a general self-supervised framework. We present MaskTab, a unified pre-training framework designed specifically for industrial-scale tabular data. MaskTab encodes missing values via dedicated learnable tokens, enabling the model to distinguish structural absence from random dropout. It jointly optimizes a hybrid supervised pre-training scheme--utilizing a twin-path architecture to reconcile masked reconstruction with task-specific supervision--and an MoE-augmented loss that adaptively routes features through specialized…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
