AIPO: Improving Training Objective for Iterative Preference Optimization

Yaojie Shen; Xinyao Wang; Yulei Niu; Ying Zhou; Lexin Tang; Libo; Zhang; Fan Chen; Longyin Wen

arXiv:2409.08845·cs.CL·September 16, 2024

AIPO: Improving Training Objective for Iterative Preference Optimization

Yaojie Shen, Xinyao Wang, Yulei Niu, Ying Zhou, Lexin Tang, Libo, Zhang, Fan Chen, Longyin Wen

PDF

Open Access 1 Repo

TL;DR

This paper introduces AIPO, a new training objective for iterative preference optimization that addresses length exploitation issues, leading to state-of-the-art results in aligning large language models.

Contribution

We propose Agreement-aware Iterative Preference Optimization (AIPO), a novel training objective that improves iterative preference optimization for aligning large language models.

Findings

01

AIPO achieves state-of-the-art performance on MT-Bench.

02

AIPO effectively mitigates length exploitation in iterative preference optimization.

03

Experimental results demonstrate the superiority of AIPO over existing methods.

Abstract

Preference Optimization (PO), is gaining popularity as an alternative choice of Proximal Policy Optimization (PPO) for aligning Large Language Models (LLMs). Recent research on aligning LLMs iteratively with synthetic or partially synthetic data shows promising results in scaling up PO training for both academic settings and proprietary trained models such as Llama3. Despite its success, our study shows that the length exploitation issue present in PO is even more severe in Iterative Preference Optimization (IPO) due to the iterative nature of the process. In this work, we study iterative preference optimization with synthetic data. We share the findings and analysis along the way of building the iterative preference optimization pipeline. More specifically, we discuss the length exploitation issue during iterative preference optimization and propose our training objective for iterative…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bytedance/aipo
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotic Path Planning Algorithms · Robotics and Automated Systems

MethodsParrot optimizer: Algorithm and applications to medical problems