Provably Efficient Iterated CVaR Reinforcement Learning with Function Approximation and Human Feedback
Yu Chen, Yihan Du, Pihe Hu, Siwei Wang, Desheng Wu, Longbo Huang

TL;DR
This paper introduces a provably efficient risk-sensitive reinforcement learning framework using Iterated CVaR with function approximation and human feedback, ensuring safety and optimality in decision-making.
Contribution
It presents a novel, theoretically grounded RL approach that incorporates human feedback and guarantees safety, with provable sample efficiency and optimality in linear settings.
Findings
Proposed algorithms are provably sample-efficient.
Established a matching lower bound for linear cases.
Demonstrated safety guarantees in decision-making processes.
Abstract
Risk-sensitive reinforcement learning (RL) aims to optimize policies that balance the expected reward and risk. In this paper, we present a novel risk-sensitive RL framework that employs an Iterated Conditional Value-at-Risk (CVaR) objective under both linear and general function approximations, enriched by human feedback. These new formulations provide a principled way to guarantee safety in each decision making step throughout the control process. Moreover, integrating human feedback into risk-sensitive RL framework bridges the gap between algorithmic decision-making and human participation, allowing us to also guarantee safety for human-in-the-loop systems. We propose provably sample-efficient algorithms for this Iterated CVaR RL and provide rigorous theoretical analysis. Furthermore, we establish a matching lower bound to corroborate the optimality of our algorithms in a linear…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
