Stochastic Actor-Critic: Mitigating Overestimation via Temporal Aleatoric Uncertainty
U\u{g}urcan \"Ozalp

TL;DR
This paper introduces STAC, a reinforcement learning algorithm that mitigates overestimation by modeling temporal aleatoric uncertainty with a distributional critic and dropout, leading to more stable and risk-averse policies.
Contribution
STAC leverages temporal aleatoric uncertainty with a single distributional critic and dropout, offering a novel approach to reduce overestimation without relying on epistemic uncertainty.
Findings
Pessimism from distributional critic alone reduces overestimation.
Dropout improves training stability and performance.
STAC achieves better computational efficiency.
Abstract
Off-policy actor-critic methods in reinforcement learning train a critic with temporal-difference updates and use it as a learning signal for the policy (actor). This design typically achieves higher sample efficiency than purely on-policy methods. However, critic networks tend to overestimate value estimates systematically. This is often addressed by introducing a pessimistic bias based on uncertainty estimates. Current methods employ ensembling to quantify the critic's epistemic uncertainty-uncertainty due to limited data and model ambiguity-to scale pessimistic updates. In this work, we propose a new algorithm called Stochastic Actor-Critic (STAC) that incorporates temporal (one-step) aleatoric uncertainty-uncertainty arising from stochastic transitions, rewards, and policy-induced variability in Bellman targets-to scale pessimistic bias in temporal-difference updates, rather than…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Adaptive Dynamic Programming Control
