Provably Efficient Iterated CVaR Reinforcement Learning with Function   Approximation and Human Feedback

Yu Chen; Yihan Du; Pihe Hu; Siwei Wang; Desheng Wu; Longbo Huang

arXiv:2307.02842·cs.LG·December 5, 2023

Provably Efficient Iterated CVaR Reinforcement Learning with Function Approximation and Human Feedback

Yu Chen, Yihan Du, Pihe Hu, Siwei Wang, Desheng Wu, Longbo Huang

PDF

Open Access

TL;DR

This paper introduces a provably efficient risk-sensitive reinforcement learning framework using Iterated CVaR with function approximation and human feedback, ensuring safety and optimality in decision-making.

Contribution

It presents a novel, theoretically grounded RL approach that incorporates human feedback and guarantees safety, with provable sample efficiency and optimality in linear settings.

Findings

01

Proposed algorithms are provably sample-efficient.

02

Established a matching lower bound for linear cases.

03

Demonstrated safety guarantees in decision-making processes.

Abstract

Risk-sensitive reinforcement learning (RL) aims to optimize policies that balance the expected reward and risk. In this paper, we present a novel risk-sensitive RL framework that employs an Iterated Conditional Value-at-Risk (CVaR) objective under both linear and general function approximations, enriched by human feedback. These new formulations provide a principled way to guarantee safety in each decision making step throughout the control process. Moreover, integrating human feedback into risk-sensitive RL framework bridges the gap between algorithmic decision-making and human participation, allowing us to also guarantee safety for human-in-the-loop systems. We propose provably sample-efficient algorithms for this Iterated CVaR RL and provide rigorous theoretical analysis. Furthermore, we establish a matching lower bound to corroborate the optimality of our algorithms in a linear…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics