Decoupling Time and Risk: Risk-Sensitive Reinforcement Learning with General Discounting
Mehrdad Moghimi, Anthony Coache, Hyejin Ku

TL;DR
This paper introduces a flexible framework for distributional reinforcement learning that decouples time and risk preferences, allowing for more expressive modeling of future rewards and risk in safety-critical domains.
Contribution
It proposes a novel multi-horizon distributional RL framework that supports general discounting, addressing limitations of fixed discount factors and improving risk-sensitive decision-making.
Findings
The framework effectively captures diverse temporal and risk preferences.
Experimental results demonstrate robustness and improved performance.
The approach offers new insights into the role of discounting in RL.
Abstract
Distributional reinforcement learning (RL) is a powerful framework increasingly adopted in safety-critical domains for its ability to optimize risk-sensitive objectives. However, the role of the discount factor is often overlooked, as it is typically treated as a fixed parameter of the Markov decision process or tunable hyperparameter, with little consideration of its effect on the learned policy. In the literature, it is well-known that the discounting function plays a major role in characterizing time preferences of an agent, which an exponential discount factor cannot fully capture. Building on this insight, we propose a novel framework that supports flexible discounting of future rewards and optimization of risk measures in distributional RL. We provide a technical analysis of the optimality of our algorithms, show that our multi-horizon extension fixes issues raised with existing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning
