CARTE: Pretraining and Transfer for Tabular Learning

Myung Jun Kim; L\'eo Grinsztajn; and Ga\"el Varoquaux

arXiv:2402.16785·cs.LG·June 3, 2024·1 cites

CARTE: Pretraining and Transfer for Tabular Learning

Myung Jun Kim, L\'eo Grinsztajn, and Ga\"el Varoquaux

PDF

Open Access 1 Repo 1 Datasets

TL;DR

CARTE introduces a neural architecture for tabular data that enables pretraining and transfer learning without requiring matched entries or schemas, outperforming traditional models and facilitating joint learning across unmatched tables.

Contribution

The paper presents CARTE, a novel graph-based neural model that handles unmatched tabular data for pretraining and transfer learning, overcoming key challenges in schema and entity matching.

Findings

01

CARTE outperforms traditional tree-based models in benchmarks.

02

It enables joint learning across unmatched tables.

03

Pretraining improves learning efficiency on tabular data.

Abstract

Pretrained deep-learning models are the go-to solution for images or text. However, for tabular data the standard is still to train tree-based models. Indeed, transfer learning on tables hits the challenge of data integration: finding correspondences, correspondences in the entries (entity matching) where different words may denote the same entity, correspondences across columns (schema matching), which may come in different orders, names... We propose a neural architecture that does not need such correspondences. As a result, we can pretrain it on background data that has not been matched. The architecture -- CARTE for Context Aware Representation of Table Entries -- uses a graph representation of tabular (or relational) data to process tables with different columns, string embedding of entries and columns names to model an open vocabulary, and a graph-attentional network to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

soda-inria/carte
pytorchOfficial

Datasets

inria-soda/carte-benchmark
dataset· 4.8k dl
4.8k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Quality and Management · Topic Modeling · Advanced Graph Neural Networks

MethodsSparse Evolutionary Training · Attentive Walk-Aggregating Graph Neural Network