Efficient Reinforcement Learning from Human Feedback via Bayesian Preference Inference

Matteo Cercola; Valeria Capretti; Simone Formentin

arXiv:2511.04286·cs.LG·November 7, 2025

Efficient Reinforcement Learning from Human Feedback via Bayesian Preference Inference

Matteo Cercola, Valeria Capretti, Simone Formentin

PDF

Open Access

TL;DR

This paper introduces a hybrid reinforcement learning framework that combines the scalability of RLHF with the sample efficiency of PBO, enabling more effective learning from human preferences in high-dimensional and language model tasks.

Contribution

It unifies RLHF and PBO into a single framework with an active preference querying module, improving sample efficiency and performance.

Findings

01

Enhanced sample efficiency in preference optimization

02

Improved performance in LLM fine-tuning

03

Consistent gains across multiple domains

Abstract

Learning from human preferences is a cornerstone of aligning machine learning models with subjective human judgments. Yet, collecting such preference data is often costly and time-consuming, motivating the need for more efficient learning paradigms. Two established approaches offer complementary advantages: RLHF scales effectively to high-dimensional tasks such as LLM fine-tuning, while PBO achieves greater sample efficiency through active querying. We propose a hybrid framework that unifies RLHF's scalability with PBO's query efficiency by integrating an acquisition-driven module into the RLHF pipeline, thereby enabling active and sample-efficient preference gathering. We validate the proposed approach on two representative domains: (i) high-dimensional preference optimization and (ii) LLM fine-tuning. Experimental results demonstrate consistent improvements in both sample efficiency…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRecommender Systems and Techniques · Gaussian Processes and Bayesian Inference · Machine Learning and Algorithms