In-Context Learning on a Budget: A Case Study in Token Classification
Uri Berger, Tal Baumel, Gabriel Stanovsky

TL;DR
This paper investigates how to effectively select a small number of samples for annotation in token classification tasks to maximize performance, revealing that simple methods often perform comparably to more complex ones and that small annotated pools can match full dataset performance.
Contribution
It introduces a realistic in-context learning paradigm considering annotation budgets for token classification, and systematically evaluates sample selection methods under this constraint.
Findings
No method significantly outperforms others, including random selection.
Small annotated pools can achieve performance similar to full datasets.
Most methods yield comparable results across various tasks and models.
Abstract
Few shot in-context learning (ICL) typically assumes access to large annotated training sets. However, in many real world scenarios, such as domain adaptation, there is only a limited budget to annotate a small number of samples, with the goal of maximizing downstream performance. We study various methods for selecting samples to annotate within a predefined budget, focusing on token classification tasks, which are expensive to annotate and are relatively less studied in ICL setups. Across various tasks, models, and datasets, we observe that no method significantly outperforms the others, with most yielding similar results, including random sample selection for annotation. Moreover, we demonstrate that a relatively small annotated sample pool can achieve performance comparable to using the entire training set. We hope that future work adopts our realistic paradigm which takes annotation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Data Quality and Management
