Latent Adversarial Regularization for Offline Preference Optimization

Enyi Jiang; Yibo Jacky Zhang; Yinglun Xu; Andreas Haupt; Nancy Amato; Sanmi Koyejo

arXiv:2601.22083·cs.LG·February 3, 2026

Latent Adversarial Regularization for Offline Preference Optimization

Enyi Jiang, Yibo Jacky Zhang, Yinglun Xu, Andreas Haupt, Nancy Amato, Sanmi Koyejo

PDF

Open Access 2 Models

TL;DR

This paper introduces GANPO, a latent-space regularization method for offline preference optimization in language models, which improves robustness and performance by penalizing divergence in internal representations using an adversarial approach.

Contribution

GANPO is a novel latent-space regularization technique that enhances preference optimization for language models by leveraging adversarial training to minimize internal representation divergence.

Findings

01

GANPO improves preference optimization across multiple models and tasks.

02

Latent-space regularization offers more robust feedback under distributional shift and noise.

03

GANPO maintains comparable downstream performance with minor computational overhead.

Abstract

Learning from human feedback typically relies on preference optimization that constrains policy updates through token-level regularization. However, preference optimization for language models is particularly challenging because token-space similarity does not imply semantic or behavioral similarity. To address this challenge, we leverage latent-space regularization for language model preference optimization. We introduce GANPO, which achieves latent-space regularization by penalizing divergence between the internal representations of a policy model and a reference model. Given that latent representations are not associated with explicit probability densities, we adopt an adversarial approach inspired by GANs to minimize latent-space divergence. We integrate GANPO as a regularizer into existing offline preference optimization objectives. Experiments across multiple model architectures…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Domain Adaptation and Few-Shot Learning · Reinforcement Learning in Robotics