Language Model Representations for Efficient Few-Shot Tabular Classification
Inwon Kang, Parikshit Ram, Yi Zhou, Horst Samulowitz, Oshani Seneviratne

TL;DR
This paper explores using large language models for few-shot classification of web-native tables by leveraging semantic embeddings and simple calibration techniques, achieving competitive performance without retraining.
Contribution
It introduces TaRL, a lightweight method that utilizes LLM embeddings for tabular classification, with techniques to improve performance in low-data scenarios.
Findings
Naive embeddings underperform specialized models.
Removing common embedding components improves results.
Calibrating softmax temperature enhances classification accuracy.
Abstract
The Web is a rich source of structured data in the form of tables, from product catalogs and knowledge bases to scientific datasets. However, the heterogeneity of the structure and semantics of these tables makes it challenging to build a unified method that can effectively leverage the information they contain. Meanwhile, Large language models (LLMs) are becoming an increasingly integral component of web infrastructure for tasks like semantic search. This raises a crucial question: can we leverage these already-deployed LLMs to classify structured data in web-native tables (e.g., product catalogs, knowledge base exports, scientific data portals), avoiding the need for specialized models or extensive retraining? This work investigates a lightweight paradigm, ble epresentation with anguage Model~(), for few-shot tabular classification…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Data Visualization and Analytics · Web Data Mining and Analysis
