Provably Feedback-Efficient Reinforcement Learning via Active Reward   Learning

Dingwen Kong; Lin F. Yang

arXiv:2304.08944·cs.LG·April 19, 2023·1 cites

Provably Feedback-Efficient Reinforcement Learning via Active Reward Learning

Dingwen Kong, Lin F. Yang

PDF

Open Access 1 Video

TL;DR

This paper introduces a provably feedback-efficient reinforcement learning algorithm that actively learns reward functions with minimal human queries, ensuring near-optimal policies even with noisy feedback.

Contribution

It proposes a theoretically grounded active-learning-based RL framework that reduces human feedback requirements for reward specification in RL tasks.

Findings

01

Achieves near-optimal policies with $ ilde{O}(H ext{dim}_R^2)$ reward queries.

02

Handles noisy feedback effectively, maintaining query efficiency.

03

Outperforms standard RL in reward query complexity by a significant margin.

Abstract

An appropriate reward function is of paramount importance in specifying a task in reinforcement learning (RL). Yet, it is known to be extremely challenging in practice to design a correct reward function for even simple tasks. Human-in-the-loop (HiL) RL allows humans to communicate complex goals to the RL agent by providing various types of feedback. However, despite achieving great empirical successes, HiL RL usually requires too much feedback from a human teacher and also suffers from insufficient theoretical understanding. In this paper, we focus on addressing this issue from a theoretical perspective, aiming to provide provably feedback-efficient algorithmic frameworks that take human-in-the-loop to specify rewards of given tasks. We provide an active-learning-based RL algorithm that first explores the environment without specifying a reward function and then asks a human teacher…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Provably Feedback-Efficient Reinforcement Learning via Active Reward Learning· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Age of Information Optimization · Advanced Bandit Algorithms Research