IOPO: Empowering LLMs with Complex Instruction Following via Input-Output Preference Optimization
Xinghua Zhang, Haiyang Yu, Cheng Fu, Fei Huang, Yongbin Li

TL;DR
This paper introduces IOPO, a novel method for enhancing large language models' ability to follow complex instructions, supported by a new benchmark TRACE, demonstrating significant performance improvements over existing methods.
Contribution
The paper presents IOPO, a new input-output preference optimization technique, and TRACE, a comprehensive benchmark for evaluating complex instruction-following in LLMs.
Findings
IOPO achieves up to 8.15% improvement on in-domain data.
IOPO outperforms SFT and DPO on out-of-domain data.
TRACE provides a large-scale dataset for training and evaluation.
Abstract
In the realm of large language models (LLMs), the ability of models to accurately follow instructions is paramount as more agents and applications leverage LLMs for construction, where the complexity of instructions are rapidly increasing. However, on the one hand, there is only a certain amount of complex instruction evaluation data; on the other hand, there are no dedicated algorithms to improve the ability to follow complex instructions. To this end, this paper introduces TRACE, a benchmark for improving and evaluating the complex instructionfollowing ability, which consists of 120K training data and 1K evaluation data. Furthermore, we propose IOPO (Input-Output Preference Optimization) alignment method which takes both input and output preference pairs into consideration, where LLMs not only rapidly align with response preferences but also meticulously explore the instruction…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsSemantic Web and Ontologies · Scheduling and Optimization Algorithms
MethodsALIGN · Direct Preference Optimization · Shrink and Fine-Tune
