Reflective Preference Optimization (RPO): Enhancing On-Policy Alignment via Hint-Guided Reflection
Zihui Zhao, Zechang Li

TL;DR
Reflective Preference Optimization (RPO) improves on-policy alignment of large models by using hint-guided reflection to generate stronger preference signals, leading to faster, more stable training and reduced hallucinations.
Contribution
The paper introduces RPO, a novel framework that incorporates external hints into preference optimization, enhancing contrastiveness and sample efficiency in model alignment.
Findings
RPO achieves better alignment with fewer samples and iterations.
RPO substantially reduces hallucination rates.
RPO delivers state-of-the-art results on multimodal benchmarks.
Abstract
Direct Preference Optimization (DPO) has emerged as a lightweight and effective alternative to Reinforcement Learning from Human Feedback (RLHF) and Reinforcement Learning with AI Feedback (RLAIF) for aligning large language and vision-language models. However, the standard DPO formulation, in which both the chosen and rejected responses are generated by the same policy, suffers from a weak learning signal because the two responses often share similar errors and exhibit small Kullback-Leibler (KL) divergence. This leads to slow and unstable convergence. To address this limitation, we introduce Reflective Preference Optimization (RPO), a new framework that incorporates hint-guided reflection into the DPO paradigm. RPO uses external models to identify hallucination sources and generate concise reflective hints, enabling the construction of on-policy preference pairs with stronger…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Multimodal Machine Learning Applications · Machine Learning and Data Classification
