Is Risk-Sensitive Reinforcement Learning Properly Resolved?
Ruiwen Zhou, Minghuan Liu, Kan Ren, Xufang Luo, Weinan Zhang, Dongsheng Li

TL;DR
This paper critically examines the limitations of current risk-sensitive reinforcement learning methods, proving their biases, and introduces a new algorithm, Trajectory Q-Learning, that guarantees policy improvement for risk-sensitive objectives.
Contribution
It provides a theoretical analysis showing existing methods do not properly optimize risk measures and proposes a novel, provably improving algorithm for risk-sensitive reinforcement learning.
Findings
Existing methods do not achieve unbiased optimization of risk measures.
The proposed TQL algorithm guarantees policy improvement towards optimal risk-sensitive policies.
Experimental results demonstrate the effectiveness of TQL in learning better risk-sensitive policies.
Abstract
Due to the nature of risk management in learning applicable policies, risk-sensitive reinforcement learning (RSRL) has been realized as an important direction. RSRL is usually achieved by learning risk-sensitive objectives characterized by various risk measures, under the framework of distributional reinforcement learning. However, it remains unclear if the distributional Bellman operator properly optimizes the RSRL objective in the sense of risk measures. In this paper, we prove that the existing RSRL methods do not achieve unbiased optimization and cannot guarantee optimality or even improvements regarding risk measures over accumulated return distributions. To remedy this issue, we further propose a novel algorithm, namely Trajectory Q-Learning (TQL), for RSRL problems with provable policy improvement towards the optimal policy. Based on our new learning architecture, we are free to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
MethodsQ-Learning
