SelectIT: Selective Instruction Tuning for LLMs via Uncertainty-Aware Self-Reflection
Liangxin Liu, Xuebo Liu, Derek F. Wong, Dongfang Li, Ziyi Wang,, Baotian Hu, Min Zhang

TL;DR
SelectIT introduces an uncertainty-aware self-reflection method that enables efficient instruction data selection for LLMs, improving performance without additional models or data, and demonstrates its effectiveness across models and domains.
Contribution
It proposes a novel LLM-based data selection approach for instruction tuning, creating the Selective Alpaca dataset and showing improved model performance without extra resource requirements.
Findings
Selective Alpaca enhances model abilities.
SelectIT improves robustness across models.
Longer, high-quality data boosts instruction tuning.
Abstract
Instruction tuning (IT) is crucial to tailoring large language models (LLMs) towards human-centric interactions. Recent advancements have shown that the careful selection of a small, high-quality subset of IT data can significantly enhance the performance of LLMs. Despite this, common approaches often rely on additional models or data, which increases costs and limits widespread adoption. In this work, we propose a novel approach, termed SelectIT, that capitalizes on the foundational capabilities of the LLM itself. Specifically, we exploit the intrinsic uncertainty present in LLMs to more effectively select high-quality IT data, without the need for extra resources. Furthermore, we introduce a curated IT dataset, the Selective Alpaca, created by applying SelectIT to the Alpaca-GPT4 dataset. Empirical results demonstrate that IT using Selective Alpaca leads to substantial model ability…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling
