Risk-Sensitive Exponential Actor Critic
Alonso Granados, Jason Pacheco

TL;DR
This paper introduces rsEAC, a risk-sensitive actor-critic algorithm that improves numerical stability and effectiveness in learning risk-aware policies in complex continuous tasks, supported by new theoretical insights.
Contribution
The paper provides a theoretical foundation for policy gradients on the entropic risk measure and proposes rsEAC, a novel off-policy method avoiding explicit exponential value functions.
Findings
rsEAC achieves more stable updates than existing methods.
Successfully learns risk-sensitive policies in MuJoCo tasks.
Provides theoretical justification for risk-sensitive policy gradients.
Abstract
Model-free deep reinforcement learning (RL) algorithms have achieved tremendous success on a range of challenging tasks. However, safety concerns remain when these methods are deployed on real-world applications, necessitating risk-aware agents. A common utility for learning such risk-aware agents is the entropic risk measure, but current policy gradient methods optimizing this measure must perform high-variance and numerically unstable updates. As a result, existing risk-sensitive model-free approaches are limited to simple tasks and tabular settings. In this paper, we provide a comprehensive theoretical justification for policy gradient methods on the entropic risk measure, including on- and off-policy gradient theorems for the stochastic and deterministic policy settings. Motivated by theory, we propose risk-sensitive exponential actor-critic (rsEAC), an off-policy model-free…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Adaptive Dynamic Programming Control
