Large Language Models are Demonstration Pre-Selectors for Themselves

Jiarui Jin; Yuwei Wu; Haoxuan Li; Xiaoting He; Weinan Zhang; Yiming Yang; Yong Yu; Jun Wang; Mengyue Yang

arXiv:2506.06033·cs.CL·June 9, 2025

Large Language Models are Demonstration Pre-Selectors for Themselves

Jiarui Jin, Yuwei Wu, Haoxuan Li, Xiaoting He, Weinan Zhang, Yiming Yang, Yong Yu, Jun Wang, Mengyue Yang

PDF

Open Access 1 Video

TL;DR

This paper introduces FEEDER, a pre-selection framework that efficiently identifies a representative subset of training data for large language models, reducing data size by over 20% while maintaining performance in in-context learning and fine-tuning.

Contribution

FEEDER proposes a novel pre-selection method using sufficiency and necessity metrics, along with a tree-based algorithm, to select representative training examples tailored to specific LLMs, improving efficiency.

Findings

01

Reduces training data size by over 20% without performance loss.

02

Maintains comparable in-context learning performance with fewer demonstrations.

03

Enhances fine-tuning efficiency through bi-level optimization.

Abstract

In-context learning (ICL) with large language models (LLMs) delivers strong few-shot performance by choosing few-shot demonstrations from the entire training data. However, existing ICL methods, which rely on similarity or diversity scores to choose demonstrations, incur high computational costs due to repeatedly retrieval from large-scale datasets for each query. To this end, we propose FEEDER (FEw yet Essential Demonstration prE-selectoR), a novel pre-selection framework that identifies a representative subset of demonstrations containing the most representative examples in the training data, tailored to specific LLMs. To construct this subset, we introduce the "sufficiency" and "necessity" metrics in the pre-selection stage and design a tree-based algorithm to identify representative examples efficiently. Once pre-selected, this representative subset can effectively replace the full…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Large Language Models are Demonstration Pre-Selectors for Themselves· slideslive

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Topic Modeling · Machine Learning and Data Classification