Use Random Selection for Now: Investigation of Few-Shot Selection   Strategies in LLM-based Text Augmentation for Classification

Jan Cegin; Branislav Pecher; Jakub Simko; Ivan Srba; Maria Bielikova,; Peter Brusilovsky

arXiv:2410.10756·cs.CL·October 15, 2024

Use Random Selection for Now: Investigation of Few-Shot Selection Strategies in LLM-based Text Augmentation for Classification

Jan Cegin, Branislav Pecher, Jakub Simko, Ivan Srba, Maria Bielikova,, Peter Brusilovsky

PDF

Open Access 1 Repo

TL;DR

This paper compares random and informed sample selection strategies for LLM-based text augmentation in classification tasks, finding that random selection remains effective despite some marginal gains from informed methods.

Contribution

It provides a comprehensive comparison of sample selection strategies in few-shot LLM-based augmentation, highlighting the limited benefits of informed selection over random sampling.

Findings

01

Informed strategies can improve out-of-distribution performance.

02

Random selection remains a strong baseline.

03

Performance gains from informed strategies are marginal.

Abstract

The generative large language models (LLMs) are increasingly used for data augmentation tasks, where text samples are paraphrased (or generated anew) and then used for classifier fine-tuning. Existing works on augmentation leverage the few-shot scenarios, where samples are given to LLMs as part of prompts, leading to better augmentations. Yet, the samples are mostly selected randomly and a comprehensive overview of the effects of other (more ``informed'') sample selection strategies is lacking. In this work, we compare sample selection strategies existing in few-shot learning literature and investigate their effects in LLM-based textual augmentation. We evaluate this on in-distribution and out-of-distribution classifier performance. Results indicate, that while some ``informed'' selection strategies increase the performance of models, especially for out-of-distribution data, it happens…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kinit-sk/selec-strats-for-aug
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText and Document Classification Technologies · Natural Language Processing Techniques