IOPO: Empowering LLMs with Complex Instruction Following via Input-Output Preference Optimization

Xinghua Zhang; Haiyang Yu; Cheng Fu; Fei Huang; Yongbin Li

arXiv:2411.06208·cs.CL·July 18, 2025

IOPO: Empowering LLMs with Complex Instruction Following via Input-Output Preference Optimization

Xinghua Zhang, Haiyang Yu, Cheng Fu, Fei Huang, Yongbin Li

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces IOPO, a novel method for enhancing large language models' ability to follow complex instructions, supported by a new benchmark TRACE, demonstrating significant performance improvements over existing methods.

Contribution

The paper presents IOPO, a new input-output preference optimization technique, and TRACE, a comprehensive benchmark for evaluating complex instruction-following in LLMs.

Findings

01

IOPO achieves up to 8.15% improvement on in-domain data.

02

IOPO outperforms SFT and DPO on out-of-domain data.

03

TRACE provides a large-scale dataset for training and evaluation.

Abstract

In the realm of large language models (LLMs), the ability of models to accurately follow instructions is paramount as more agents and applications leverage LLMs for construction, where the complexity of instructions are rapidly increasing. However, on the one hand, there is only a certain amount of complex instruction evaluation data; on the other hand, there are no dedicated algorithms to improve the ability to follow complex instructions. To this end, this paper introduces TRACE, a benchmark for improving and evaluating the complex instructionfollowing ability, which consists of 120K training data and 1K evaluation data. Furthermore, we propose IOPO (Input-Output Preference Optimization) alignment method which takes both input and output preference pairs into consideration, where LLMs not only rapidly align with response preferences but also meticulously explore the instruction…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

alibabaresearch/damo-convai
pytorchOfficial

Videos

IOPO: Empowering LLMs with Complex Instruction Following via Input-Output Preference Optimization· underline

Taxonomy

TopicsSemantic Web and Ontologies · Scheduling and Optimization Algorithms

MethodsALIGN · Direct Preference Optimization · Shrink and Fine-Tune