Rethinking Pre-Training in Tabular Data: A Neighborhood Embedding Perspective
Han-Jia Ye, Qi-Le Zhou, Huai-Hong Yin, De-Chuan Zhan, Wei-Lun Chao

TL;DR
This paper introduces TabPTM, a novel pre-training approach for tabular data that uses neighborhood-based meta-representations to handle heterogeneity across datasets, enabling effective transfer learning without fine-tuning.
Contribution
The paper proposes a neighborhood embedding method for pre-training on diverse tabular datasets, transforming heterogeneous tasks into homogeneous local prediction problems.
Findings
TabPTM achieves superior performance on 101 datasets.
Effective in both classification and regression tasks.
Works without fine-tuning on new datasets.
Abstract
Pre-training is prevalent in deep learning for vision and text data, leveraging knowledge from other datasets to enhance downstream tasks. However, for tabular data, the inherent heterogeneity in attribute and label spaces across datasets complicates the learning of shareable knowledge. We propose Tabular data Pre-Training via Meta-representation (TabPTM), aiming to pre-train a general tabular model over diverse datasets. The core idea is to embed data instances into a shared feature space, where each instance is represented by its distance to a fixed number of nearest neighbors and their labels. This ''meta-representation'' transforms heterogeneous tasks into homogeneous local prediction problems, enabling the model to infer labels (or scores for each label) based on neighborhood information. As a result, the pre-trained TabPTM can be applied directly to new datasets, regardless of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Machine Learning and Data Classification · Advanced Neural Network Applications
MethodsSparse Evolutionary Training
