ComPO: Preference Alignment via Comparison Oracles

Peter Chen; Xi Chen; Wotao Yin; Tianyi Lin

arXiv:2505.05465·cs.CL·October 28, 2025

ComPO: Preference Alignment via Comparison Oracles

Peter Chen, Xi Chen, Wotao Yin, Tianyi Lin

PDF

Open Access 1 Models 1 Video

TL;DR

This paper introduces ComPO, a novel preference alignment method using comparison oracles that effectively handles noisy preference pairs, improving LLM alignment with human preferences through a zeroth-order optimization approach.

Contribution

Proposes a new comparison-based preference alignment method with convergence guarantees and demonstrates its effectiveness on multiple models and benchmarks.

Findings

01

Effective in improving LLM performance with noisy preference data

02

Outperforms existing direct alignment methods in experiments

03

Highlights the importance of specialized methods for different preference pair likelihood margins

Abstract

Direct alignment methods are increasingly used for aligning large language models (LLMs) with human preferences. However, these methods suffer from the issues of verbosity and likelihood displacement, which can be driven by the noisy preference pairs that induce similar likelihood for preferred and dispreferred responses. The contributions of this paper are two-fold. First, we propose a new preference alignment method based on zeroth-order, comparison-based optimization via comparison oracles and provide convergence guarantees for its basic scheme. Second, we improve our method using some heuristics and conduct the experiments to demonstrate the flexibility and compatibility of practical scheme in improving the performance of LLMs using noisy preference pairs. Evaluations are conducted across multiple base and instruction-tuned models (Mistral-7B, Llama-3-8B and Gemma-2-9B) with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
DZgas/Gemma-2-9b-it-SimPO-ComPO-2-GGUF
model· 754 dl
754 dl

Videos

ComPO: Preference Alignment via Comparison Oracles· slideslive

Taxonomy

TopicsTopic Modeling · Recommender Systems and Techniques · Constraint Satisfaction and Optimization

MethodsBalanced Selection