Scalable In-Context Learning on Tabular Data via Retrieval-Augmented   Large Language Models

Xumeng Wen; Shun Zheng; Zhen Xu; Yiming Sun; Jiang Bian

arXiv:2502.03147·cs.CL·February 6, 2025

Scalable In-Context Learning on Tabular Data via Retrieval-Augmented Large Language Models

Xumeng Wen, Shun Zheng, Zhen Xu, Yiming Sun, Jiang Bian

PDF

Open Access

TL;DR

This paper introduces a retrieval-augmented approach to large language models for scalable in-context learning on tabular data, overcoming sequence length limitations and improving performance across diverse datasets.

Contribution

It presents a novel retrieval-guided instruction-tuning method enabling LLMs to handle larger tabular datasets effectively.

Findings

01

Improved performance on 69 datasets

02

Enhanced scalability with larger datasets

03

Uncovered powerful algorithms in limited contexts

Abstract

Recent studies have shown that large language models (LLMs), when customized with post-training on tabular data, can acquire general tabular in-context learning (TabICL) capabilities. These models are able to transfer effectively across diverse data schemas and different task domains. However, existing LLM-based TabICL approaches are constrained to few-shot scenarios due to the sequence length limitations of LLMs, as tabular instances represented in plain text consume substantial tokens. To address this limitation and enable scalable TabICL for any data size, we propose retrieval-augmented LLMs tailored to tabular data. Our approach incorporates a customized retrieval module, combined with retrieval-guided instruction-tuning for LLMs. This enables LLMs to effectively leverage larger datasets, achieving significantly improved performance across 69 widely recognized datasets and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications