In-Context Learning on a Budget: A Case Study in Token Classification

Uri Berger; Tal Baumel; Gabriel Stanovsky

arXiv:2406.13274·cs.CL·January 29, 2025

In-Context Learning on a Budget: A Case Study in Token Classification

Uri Berger, Tal Baumel, Gabriel Stanovsky

PDF

Open Access

TL;DR

This paper investigates how to effectively select a small number of samples for annotation in token classification tasks to maximize performance, revealing that simple methods often perform comparably to more complex ones and that small annotated pools can match full dataset performance.

Contribution

It introduces a realistic in-context learning paradigm considering annotation budgets for token classification, and systematically evaluates sample selection methods under this constraint.

Findings

01

No method significantly outperforms others, including random selection.

02

Small annotated pools can achieve performance similar to full datasets.

03

Most methods yield comparable results across various tasks and models.

Abstract

Few shot in-context learning (ICL) typically assumes access to large annotated training sets. However, in many real world scenarios, such as domain adaptation, there is only a limited budget to annotate a small number of samples, with the goal of maximizing downstream performance. We study various methods for selecting samples to annotate within a predefined budget, focusing on token classification tasks, which are expensive to annotate and are relatively less studied in ICL setups. Across various tasks, models, and datasets, we observe that no method significantly outperforms the others, with most yielding similar results, including random sample selection for annotation. Moreover, we demonstrate that a relatively small annotated sample pool can achieve performance comparable to using the entire training set. We hope that future work adopts our realistic paradigm which takes annotation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Data Quality and Management