Reverse Preference Optimization for Complex Instruction Following

Xiang Huang; Ting-En Lin; Feiteng Fang; Yuchuan Wu; Hangyu Li; Yuzhong Qu; Fei Huang; Yongbin Li

arXiv:2505.22172·cs.CL·May 29, 2025

Reverse Preference Optimization for Complex Instruction Following

Xiang Huang, Ting-En Lin, Feiteng Fang, Yuchuan Wu, Hangyu Li, Yuzhong Qu, Fei Huang, Yongbin Li

PDF

Open Access

TL;DR

This paper introduces Reverse Preference Optimization (RPO), a novel method for improving instruction following in large language models by dynamically reversing constraints to reduce noise and enhance alignment with complex multi-constraint instructions.

Contribution

RPO is a new approach that reverses constraints in preference pairs to improve robustness and effectiveness in complex instruction following tasks.

Findings

01

RPO outperforms the DPO baseline by 4.6 and 2.5 points on two benchmarks.

02

RPO scales effectively from 8B to 70B parameters.

03

The 70B RPO model surpasses GPT-4o in performance.

Abstract

Instruction following (IF) is a critical capability for large language models (LLMs). However, handling complex instructions with multiple constraints remains challenging. Previous methods typically select preference pairs based on the number of constraints they satisfy, introducing noise where chosen examples may fail to follow some constraints and rejected examples may excel in certain respects over the chosen ones. To address the challenge of aligning with multiple preferences, we propose a simple yet effective method called Reverse Preference Optimization (RPO). It mitigates noise in preference pairs by dynamically reversing the constraints within the instruction to ensure the chosen response is perfect, alleviating the burden of extensive sampling and filtering to collect perfect responses. Besides, reversal also enlarges the gap between chosen and rejected responses, thereby…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications