Automated Instruction Revision (AIR): A Structured Comparison of Task Adaptation Strategies for LLM
Solomiia Bilyk, Volodymyr Getmanskyi, Taras Firman

TL;DR
This paper evaluates Automated Instruction Revision (AIR), a rule-based method for adapting large language models to various tasks, comparing it with other strategies across diverse benchmarks to understand its strengths and limitations.
Contribution
The paper provides a comprehensive comparison of AIR with prompt optimization, retrieval, and fine-tuning, highlighting task-dependent performance and guiding when to use each adaptation strategy.
Findings
AIR excels in label-remapping classification tasks.
Retrieval-based methods perform best on closed-book QA.
Fine-tuning dominates structured extraction and reasoning tasks.
Abstract
This paper studies Automated Instruction Revision (AIR), a rule-induction-based method for adapting large language models (LLMs) to downstream tasks using limited task-specific examples. We position AIR within the broader landscape of adaptation strategies, including prompt optimization, retrieval-based methods, and fine-tuning. We then compare these approaches across a diverse benchmark suite designed to stress different task requirements, such as knowledge injection, structured extraction, label remapping, and logical reasoning. The paper argues that adaptation performance is strongly task-dependent: no single method dominates across all settings. Across five benchmarks, AIR was strongest or near-best on label-remapping classification, while KNN retrieval performed best on closed-book QA, and fine-tuning dominated structured extraction and event-order reasoning. AIR is most promising…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
