On Training Instance Selection for Few-Shot Neural Text Generation
Ernie Chang, Xiaoyu Shen, Hui-Syuan Yeh, Vera Demberg

TL;DR
This paper investigates how selecting training instances based on clustering can improve few-shot neural text generation, outperforming random sampling across multiple tasks.
Contribution
It introduces a clustering-based data selection strategy for few-shot training, emphasizing diversity and representativeness, which enhances model performance.
Findings
Clustering-based selection outperforms random sampling in text generation tasks.
Simple K-means clustering effectively identifies valuable training instances.
The approach improves performance on data-to-text, summarization, and question generation.
Abstract
Large-scale pretrained language models have led to dramatic improvements in text generation. Impressive performance can be achieved by finetuning only on a small number of instances (few-shot setting). Nonetheless, almost all previous work simply applies random sampling to select the few-shot training instances. Little to no attention has been paid to the selection strategies and how they would affect model performance. In this work, we present a study on training instance selection in few-shot neural text generation. The selection decision is made based only on the unlabeled data so as to identify the most worthwhile data points that should be annotated under some budget of labeling cost. Based on the intuition that the few-shot training instances should be diverse and representative of the entire data distribution, we propose a simple selection strategy with K-means clustering. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
