Sample-Efficient Reinforcement Learning from Human Feedback via Information-Directed Sampling
Han Qi, Haochen Yang, Qiaosheng Zhang, Zhuoran Yang

TL;DR
This paper introduces novel, sample-efficient algorithms for reinforcement learning from human feedback using information-directed sampling, with theoretical guarantees and practical approximations applicable to large language models.
Contribution
It develops IDS-based RLHF algorithms with theoretical regret bounds, introduces a surrogate environment and a new distance measure, and proposes a computationally efficient approximate method.
Findings
Achieves Bayesian regret bounds of order $O(H^{3/2}\sqrt{\log(K(\epsilon)) T})$
Specializes to tabular settings with regret of order $ ilde{O}(H^2\sqrt{SAT})$
Proposes an approximate IDS algorithm maintaining sample efficiency
Abstract
We study the problem of reinforcement learning from human feedback (RLHF), a critical problem in training large language models, from a theoretical perspective. Our main contribution is the design of novel sample-efficient RLHF algorithms based on information-directed sampling (IDS), an online decision-making principle inspired by information theory. Our algorithms maximize the sum of the value function and a mutual information term that encourages exploration of the unknown environment (which quantifies the information gained about the environment through observed human feedback data). To tackle the challenge of large state spaces and improve sample efficiency, we construct a simplified \emph{surrogate environment} and introduce a novel distance measure (named the \emph{-distance}), enabling our IDS-based algorithm to achieve a Bayesian regret upper bound of order…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Distributed Sensor Networks and Detection Algorithms · Neural Networks and Applications
