On The Global Convergence Of Online RLHF With Neural Parametrization

Mudit Gaur; Amrit Singh Bedi; Raghu Pasupathy; Vaneet Aggarwal

arXiv:2410.15610·cs.LG·May 27, 2025

On The Global Convergence Of Online RLHF With Neural Parametrization

Mudit Gaur, Amrit Singh Bedi, Raghu Pasupathy, Vaneet Aggarwal

PDF

Open Access

TL;DR

This paper establishes the first theoretical convergence guarantees for online RLHF with neural network parametrization, addressing distribution shift issues and proposing a bi-level optimization approach with proven efficiency.

Contribution

It introduces a bi-level formulation for neural RLHF, proposes a first-order solution method, and provides the first convergence rate bounds in this setting.

Findings

01

Proposed a bi-level formulation for neural RLHF.

02

Developed a first-order algorithm with convergence guarantees.

03

Achieved state-of-the-art sample complexity bounds.

Abstract

The importance of Reinforcement Learning from Human Feedback (RLHF) in aligning large language models (LLMs) with human values cannot be overstated. RLHF is a three-stage process that includes supervised fine-tuning (SFT), reward learning, and policy learning. Although there are several offline and online approaches to aligning LLMs, they often suffer from distribution shift issues. These issues arise from the inability to accurately capture the distributional interdependence between the reward learning and policy learning stages. Consequently, this has led to various approximated approaches, but the theoretical insights and motivations remain largely limited to tabular settings, which do not hold in practice. This gap between theoretical insights and practical implementations is critical. It is challenging to address this gap as it requires analyzing the performance of AI alignment…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Compression Techniques · Advanced Adaptive Filtering Techniques