Automatic Demonstration Selection for LLM-based Tabular Data Classification
Shuchu Han, Wolfgang Bruckner

TL;DR
This paper introduces an algorithm that automatically determines the optimal number of demonstrations for in-context learning in tabular data classification by leveraging spectral graph theory to analyze demonstration similarities.
Contribution
It presents a novel spectral graph theory-based method that considers data distribution, prompt template, and LLM characteristics to select demonstrations effectively.
Findings
The method outperforms random selection algorithms in experiments.
It effectively estimates the minimal number of demonstrations needed.
The approach adapts to different datasets and LLMs.
Abstract
A fundamental question in applying In-Context Learning (ICL) for tabular data classification is how to determine the ideal number of demonstrations in the prompt. This work addresses this challenge by presenting an algorithm to automatically select a reasonable number of required demonstrations. Our method distinguishes itself by integrating not only the tabular data's distribution but also the user's selected prompt template and the specific Large Language Model (LLM) into its estimation. Rooted in Spectral Graph Theory, our proposed algorithm defines a novel metric to quantify the similarities between different demonstrations. We then construct a similarity graph and analyze the eigenvalues of its Laplacian to derive the minimum number of demonstrations capable of representing the data within the LLM's intrinsic representation space. We validate the efficacy of our approach through…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification
