SelectLLM: Can LLMs Select Important Instructions to Annotate?

Ritik Sachin Parkar; Jaehyung Kim; Jong Inn Park; Dongyeop Kang

arXiv:2401.16553·cs.CL·August 28, 2024·1 cites

SelectLLM: Can LLMs Select Important Instructions to Annotate?

Ritik Sachin Parkar, Jaehyung Kim, Jong Inn Park, Dongyeop Kang

PDF

Open Access 1 Repo

TL;DR

SelectLLM is a framework that uses clustering and LLM prompting to select high-quality instructions for annotation, improving dataset quality efficiently.

Contribution

It introduces a novel method combining coreset clustering and LLM prompting for instruction selection, outperforming existing approaches.

Findings

01

Outperforms state-of-the-art methods like Alpagasus

02

Effective across multiple LLMs such as ChatGPT and LLaMA-3.1-70B

03

Maintains high performance on both human and synthetic datasets

Abstract

Instruction tuning benefits from large and diverse datasets; however, creating such datasets involves a high cost of human labeling. While synthetic datasets generated by large language models (LLMs) have partly solved this issue, they often contain low-quality data. One effective solution is selectively annotating unlabelled instructions, especially given the relative ease of acquiring unlabeled instructions or texts from various sources. However, how to select unlabelled instructions is not well-explored, especially in the context of LLMs. Therefore, we introduce SelectLLM, an alternative framework that leverages the capabilities of LLMs to select unlabeled instructions more effectively. Specifically, SelectLLM consists of two key steps: Coreset-based clustering of unlabelled instructions for enlarging diversity and prompting of LLM to identify the most beneficial instructions within…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

minnesotanlp/select-llm
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Text Readability and Simplification · Mathematics, Computing, and Information Processing