MaskTab: Scalable Masked Tabular Pretraining with Scaling Laws and Distillation for Industrial Classification

Bo Zheng; Yudong Chen; Zihua Xiong; Shuai Fang; Peidong He; Yang Yang; Sheng Guo

arXiv:2605.11408·cs.LG·May 13, 2026

MaskTab: Scalable Masked Tabular Pretraining with Scaling Laws and Distillation for Industrial Classification

Bo Zheng, Yudong Chen, Zihua Xiong, Shuai Fang, Peidong He, Yang Yang, Sheng Guo

PDF

TL;DR

MaskTab introduces a scalable, self-supervised pretraining framework for industrial tabular data, effectively handling missing values and enabling robust, lightweight models for high-stakes decision systems.

Contribution

It presents a novel unified pretraining approach tailored for complex tabular data, incorporating missing value encoding, hybrid training schemes, and model distillation.

Findings

01

Achieves +5.04% AUC and +8.28% KS improvements on benchmarks.

02

Distills into lightweight models with +2.55% AUC and +4.85% KS gains.

03

Enhances robustness to distribution shifts in industrial datasets.

Abstract

Tabular data forms the backbone of high-stakes decision systems in finance, healthcare, and beyond. Yet industrial tabular datasets are inherently difficult: high-dimensional, riddled with missing entries, and rarely labeled at scale. While foundation models have revolutionized vision and language, tabular learning still leans on handcrafted features and lacks a general self-supervised framework. We present MaskTab, a unified pre-training framework designed specifically for industrial-scale tabular data. MaskTab encodes missing values via dedicated learnable tokens, enabling the model to distinguish structural absence from random dropout. It jointly optimizes a hybrid supervised pre-training scheme--utilizing a twin-path architecture to reconcile masked reconstruction with task-specific supervision--and an MoE-augmented loss that adaptively routes features through specialized…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.