Stochastic Actor-Critic: Mitigating Overestimation via Temporal Aleatoric Uncertainty

U\u{g}urcan \"Ozalp

arXiv:2601.00737·cs.LG·January 5, 2026

Stochastic Actor-Critic: Mitigating Overestimation via Temporal Aleatoric Uncertainty

U\u{g}urcan \"Ozalp

PDF

Open Access

TL;DR

This paper introduces STAC, a reinforcement learning algorithm that mitigates overestimation by modeling temporal aleatoric uncertainty with a distributional critic and dropout, leading to more stable and risk-averse policies.

Contribution

STAC leverages temporal aleatoric uncertainty with a single distributional critic and dropout, offering a novel approach to reduce overestimation without relying on epistemic uncertainty.

Findings

01

Pessimism from distributional critic alone reduces overestimation.

02

Dropout improves training stability and performance.

03

STAC achieves better computational efficiency.

Abstract

Off-policy actor-critic methods in reinforcement learning train a critic with temporal-difference updates and use it as a learning signal for the policy (actor). This design typically achieves higher sample efficiency than purely on-policy methods. However, critic networks tend to overestimate value estimates systematically. This is often addressed by introducing a pessimistic bias based on uncertainty estimates. Current methods employ ensembling to quantify the critic's epistemic uncertainty-uncertainty due to limited data and model ambiguity-to scale pessimistic updates. In this work, we propose a new algorithm called Stochastic Actor-Critic (STAC) that incorporates temporal (one-step) aleatoric uncertainty-uncertainty arising from stochastic transitions, rewards, and policy-induced variability in Bellman targets-to scale pessimistic bias in temporal-difference updates, rather than…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Adaptive Dynamic Programming Control