TransTab: Learning Transferable Tabular Transformers Across Tables

Zifeng Wang; Jimeng Sun

arXiv:2205.09328·cs.LG·September 19, 2022·37 cites

TransTab: Learning Transferable Tabular Transformers Across Tables

Zifeng Wang, Jimeng Sun

PDF

Open Access 1 Repo 1 Video

TL;DR

TransTab introduces a transferable transformer model for tabular data that handles varying table structures, enabling better generalization, incremental learning, and transfer learning across diverse datasets with minimal data preprocessing.

Contribution

The paper presents TransTab, a novel transformer-based approach that learns transferable embeddings from tabular data, accommodating varying table structures and enabling pretraining and incremental updates.

Findings

01

TransTab outperforms 11 baseline methods across multiple benchmarks.

02

Pretraining with TransTab improves AUC by 2.3% on average.

03

TransTab effectively handles unseen tables with different columns.

Abstract

Tabular data (or tables) are the most widely used data format in machine learning (ML). However, ML models often assume the table structure keeps fixed in training and testing. Before ML modeling, heavy data cleaning is required to merge disparate tables with different columns. This preprocessing often incurs significant data waste (e.g., removing unmatched columns and samples). How to learn ML models from multiple tables with partially overlapping columns? How to incrementally update ML models as more columns become available over time? Can we leverage model pretraining on multiple distinct tables? How to train an ML model which can predict on an unseen table? To answer all those questions, we propose to relax fixed table structures by introducing a Transferable Tabular Transformer (TransTab) for tables. The goal of TransTab is to convert each sample (a row in the table) to a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ryanwangzf/transtab
pytorchOfficial

Videos

TransTab: Learning Transferable Tabular Transformers Across Tables· slideslive

Taxonomy

TopicsMachine Learning in Healthcare · Artificial Intelligence in Healthcare · Machine Learning and Data Classification

MethodsAttention Is All You Need · Linear Layer · Softmax · Dense Connections · Position-Wise Feed-Forward Layer · Adam · Absolute Position Encodings · Byte Pair Encoding · Residual Connection · Label Smoothing