KL-Regularized RLHF with Multiple Reference Models: Exact Solutions and Sample Complexity

Gholamali Aminian; Amir R. Asadi; Idan Shenfeld; Youssef Mroueh

arXiv:2502.01203·cs.LG·October 21, 2025

KL-Regularized RLHF with Multiple Reference Models: Exact Solutions and Sample Complexity

Gholamali Aminian, Amir R. Asadi, Idan Shenfeld, Youssef Mroueh

PDF

Open Access 1 Video

TL;DR

This paper provides the first exact solution and theoretical analysis for integrating multiple reference models into KL-regularized RLHF, enhancing LLM alignment by leveraging diverse models with proven sample complexity guarantees.

Contribution

It introduces an exact solution to the multiple reference model problem in RLHF and extends analysis to forward KL, offering new theoretical insights and sample complexity bounds.

Findings

01

First exact solution for multiple reference models in RLHF

02

Sample complexity guarantees for reverse and forward KL-regularized RLHF

03

Theoretical framework enabling better LLM alignment techniques

Abstract

Recent methods for aligning large language models (LLMs) with human feedback predominantly rely on a single reference model, which limits diversity, model overfitting, and underutilizes the wide range of available pre-trained models. Incorporating multiple reference models has the potential to address these limitations by broadening perspectives, reducing bias, and leveraging the strengths of diverse open-source LLMs. However, integrating multiple reference models into reinforcement learning with human feedback (RLHF) frameworks poses significant theoretical challenges, where achieving exact solutions has remained an open problem. This paper presents the first \emph{exact solution} to the multiple reference model problem in reverse KL-regularized RLHF. We introduce a comprehensive theoretical framework that includes rigorous statistical analysis and provides sample complexity…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

KL-Regularized RLHF with Multiple Reference Models: Exact Solutions and Sample Complexity· slideslive

Taxonomy

TopicsEngineering Applied Research