Beyond RLHF: A Unified Theoretical Framework of Alignment

Jihun Yun; Juno Kim; Jongho Park; Junhyuck Kim; Jongha Jon Ryu; Jaewoong Cho; Kwang-Sung Jun

arXiv:2506.01523·cs.LG·May 19, 2026

Beyond RLHF: A Unified Theoretical Framework of Alignment

Jihun Yun, Juno Kim, Jongho Park, Junhyuck Kim, Jongha Jon Ryu, Jaewoong Cho, Kwang-Sung Jun

PDF

TL;DR

This paper introduces a unified theoretical framework for alignment in large language models, analyzing various objectives including RLHF, and providing guarantees and empirical validation.

Contribution

It reframes alignment as distribution learning from preferences, proposing three principled objectives with proven convergence guarantees and explaining empirical performance differences.

Findings

01

Reverse KL minimization resembles RLHF, justifying its effectiveness.

02

On-policy objectives outperform likelihood-style objectives empirically.

03

Proposed objectives are competitive with strong baselines across tasks.

Abstract

Alignment via reinforcement learning from human feedback (RLHF) has become the dominant paradigm for controlling the quality of outputs from large language models (LLMs). However, existing theories do not provide strong justification for the RLHF objective itself and do not allow comparisons of the guarantees between various methods because different methods are often analyzed under different frameworks. Toward a unified framework for alignment, we ask under what assumptions can we derive existing or new training objectives and obtain theoretical guarantees. To this end, we reframe alignment as distribution learning from pairwise preferences, which makes a probabilistic assumption describing how preferences reveal information about the target LM. This leads us to propose three principled alignment objectives: preference maximum likelihood estimation, preference distillation, and reverse…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques