TabLLM: Few-shot Classification of Tabular Data with Large Language   Models

Stefan Hegselmann; Alejandro Buendia; Hunter Lang; Monica Agrawal,; Xiaoyi Jiang; David Sontag

arXiv:2210.10723·cs.CL·March 20, 2023·49 cites

TabLLM: Few-shot Classification of Tabular Data with Large Language Models

Stefan Hegselmann, Alejandro Buendia, Hunter Lang, Monica Agrawal,, Xiaoyi Jiang, David Sontag

PDF

Open Access 1 Repo

TL;DR

This paper explores using large language models for zero-shot and few-shot classification of tabular data by converting tables into natural language prompts, outperforming traditional methods in many cases.

Contribution

It introduces a simple serialization approach to apply large language models to tabular classification, demonstrating competitive performance with traditional methods.

Findings

01

Zero-shot classification achieves non-trivial accuracy.

02

Few-shot fine-tuning improves performance significantly.

03

Method outperforms prior deep-learning-based tabular classifiers.

Abstract

We study the application of large language models to zero-shot and few-shot classification of tabular data. We prompt the large language model with a serialization of the tabular data to a natural-language string, together with a short description of the classification problem. In the few-shot setting, we fine-tune the large language model using some labeled examples. We evaluate several serialization methods including templates, table-to-text models, and large language models. Despite its simplicity, we find that this technique outperforms prior deep-learning-based tabular classification methods on several benchmark datasets. In most cases, even zero-shot classification obtains non-trivial performance, illustrating the method's ability to exploit prior knowledge encoded in large language models. Unlike many deep learning methods for tabular datasets, this approach is also competitive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

clinicalml/TabLLM
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Text Readability and Simplification · Natural Language Processing Techniques