InstructEval: Systematic Evaluation of Instruction Selection Methods
Anirudh Ajith, Chris Pan, Mengzhou Xia, Ameet Deshpande, Karthik, Narasimhan

TL;DR
This paper introduces InstructEval, a comprehensive evaluation suite for instruction selection in in-context learning, revealing that manual instructions often outperform automatic methods across diverse models and tasks.
Contribution
The paper develops InstructEval, a new benchmark suite for systematically assessing instruction selection methods across multiple models and tasks, highlighting the limitations of automatic instruction induction.
Findings
Manual instructions often outperform automatic methods in ICL.
Automatic instruction induction methods lack generalizability.
Curated instructions improve ICL performance across models.
Abstract
In-context learning (ICL) performs tasks by prompting a large language model (LLM) using an instruction and a small set of annotated examples called demonstrations. Recent work has shown that precise details of the inputs used in the ICL prompt significantly impact performance, which has incentivized instruction selection algorithms. The effect of instruction-choice however is severely underexplored, with existing analyses restricted to shallow subsets of models and tasks, limiting the generalizability of their insights. We develop InstructEval, an ICL evaluation suite to conduct a thorough assessment of these techniques. The suite includes 13 open-sourced LLMs of varying scales from four model families, and covers nine tasks across three categories. Using the suite, we evaluate the relative performance of seven popular instruction selection methods over five metrics relevant to ICL.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
