Scaling Experiments in Self-Supervised Cross-Table Representation   Learning

Maximilian Schambach; Dominique Paul; Johannes S. Otterbach

arXiv:2309.17339·cs.LG·October 2, 2023

Scaling Experiments in Self-Supervised Cross-Table Representation Learning

Maximilian Schambach, Dominique Paul, Johannes S. Otterbach

PDF

Open Access

TL;DR

This paper introduces a Transformer-based model for deep tabular data representation learning, exploring its scaling behavior from small to very large models trained on extensive datasets, and evaluating its performance via linear probing.

Contribution

It presents a novel Transformer architecture tailored for tabular data and systematically studies its scaling properties across different model sizes and training setups.

Findings

01

Scaling improves performance on benchmark datasets.

02

Cross-table pretraining enhances generalization.

03

Model size up to 10^7 parameters is feasible and effective.

Abstract

To analyze the scaling potential of deep tabular representation learning models, we introduce a novel Transformer-based architecture specifically tailored to tabular data and cross-table representation learning by utilizing table-specific tokenizers and a shared Transformer backbone. Our training approach encompasses both single-table and cross-table models, trained via missing value imputation through a self-supervised masked cell recovery objective. To understand the scaling behavior of our method, we train models of varying sizes, ranging from approximately $1 0^{4}$ to $1 0^{7}$ parameters. These models are trained on a carefully curated pretraining dataset, consisting of 135M training tokens sourced from 76 diverse datasets. We assess the scaling of our architecture in both single-table and cross-table pretraining setups by evaluating the pretrained models using linear probing on a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · AI in cancer detection

MethodsMulti-Head Attention · Attention Is All You Need · Dense Connections · Linear Layer · Dropout · Byte Pair Encoding · Label Smoothing · Absolute Position Encodings · Adam · Softmax