Learning Where It Matters: Geometric Anchoring for Robust Preference Alignment
Youngjae Cho, Jongsuk Kim, Ji-Hoon Kim

TL;DR
This paper introduces GAPO, a dynamic geometric anchoring method for preference alignment in language models that enhances robustness against noisy supervision by adaptively weighting preference signals.
Contribution
GAPO replaces static references with a geometry-aware, adversarial anchor, enabling adaptive reweighting and improved robustness in preference optimization.
Findings
GAPO outperforms static reference methods under noisy conditions.
It maintains or improves alignment and reasoning benchmark performance.
The Anchor Gap correlates with local margin degradation, guiding robust optimization.
Abstract
Direct Preference Optimization (DPO) and related methods align large language models from pairwise preferences by regularizing updates against a fixed reference policy. As the policy drifts, a static reference, however, can become increasingly miscalibrated, leading to distributional mismatch and amplifying spurious preference signals under noisy supervision. Conversely, reference-free variants avoid mismatch but often suffer from unconstrained reward drift. We propose Geometric Anchor Preference Optimization (GAPO), which replaces the fixed reference with a dynamic, geometry-aware anchor: an adversarial local perturbation of the current policy within a small radius that serves as a pessimistic baseline. This anchor enables an adaptive reweighting mechanism, modulating the importance of each preference pair based on its local sensitivity. We further introduce the Anchor Gap, the reward…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Constraint Satisfaction and Optimization · Explainable Artificial Intelligence (XAI)
