Provably Feedback-Efficient Reinforcement Learning via Active Reward Learning
Dingwen Kong, Lin F. Yang

TL;DR
This paper introduces a provably feedback-efficient reinforcement learning algorithm that actively learns reward functions with minimal human queries, ensuring near-optimal policies even with noisy feedback.
Contribution
It proposes a theoretically grounded active-learning-based RL framework that reduces human feedback requirements for reward specification in RL tasks.
Findings
Achieves near-optimal policies with $ ilde{O}(H ext{dim}_R^2)$ reward queries.
Handles noisy feedback effectively, maintaining query efficiency.
Outperforms standard RL in reward query complexity by a significant margin.
Abstract
An appropriate reward function is of paramount importance in specifying a task in reinforcement learning (RL). Yet, it is known to be extremely challenging in practice to design a correct reward function for even simple tasks. Human-in-the-loop (HiL) RL allows humans to communicate complex goals to the RL agent by providing various types of feedback. However, despite achieving great empirical successes, HiL RL usually requires too much feedback from a human teacher and also suffers from insufficient theoretical understanding. In this paper, we focus on addressing this issue from a theoretical perspective, aiming to provide provably feedback-efficient algorithmic frameworks that take human-in-the-loop to specify rewards of given tasks. We provide an active-learning-based RL algorithm that first explores the environment without specifying a reward function and then asks a human teacher…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Age of Information Optimization · Advanced Bandit Algorithms Research
