UniTabE: A Universal Pretraining Protocol for Tabular Foundation Model   in Data Science

Yazheng Yang; Yuqi Wang; Guang Liu; Ledell Wu; Qi Liu

arXiv:2307.09249·cs.LG·March 14, 2024

UniTabE: A Universal Pretraining Protocol for Tabular Foundation Model in Data Science

Yazheng Yang, Yuqi Wang, Guang Liu, Ledell Wu, Qi Liu

PDF

Open Access

TL;DR

UniTabE introduces a universal pretraining protocol for tabular data that effectively handles diverse table structures, improving performance in classification and regression tasks across extensive benchmarks.

Contribution

This work presents UniTabE, a novel method for pretraining on varied table schemas using a module-based representation and Transformer encoder, enabling better transferability and generalization.

Findings

01

Outperforms several baselines on large benchmarks

02

Demonstrates strong transferability across tasks

03

Effectively handles diverse table structures

Abstract

Recent advancements in NLP have witnessed the groundbreaking impact of pretrained models, yielding impressive outcomes across various tasks. This study seeks to extend the power of pretraining methodologies to facilitating the prediction over tables in data science, a domain traditionally overlooked, yet inherently challenging due to the plethora of table schemas intrinsic to different tasks. The primary research questions underpinning this work revolve around the establishment of a universal pretraining protocol for tables with varied structures, the generalizability and transferability of learned knowledge across tasks, the adaptation to diverse downstream applications, and the incorporation of incremental columns over time. In response to these challenges, we introduce UniTabE, a straightforward yet effective method designed to process tables in a uniform manner, devoid of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Quality and Management · Time Series Analysis and Forecasting · Data Stream Mining Techniques

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Residual Connection · Absolute Position Encodings · Adam · Layer Normalization · Label Smoothing