Pseudo-Quantized Actor-Critic Algorithm for Robustness to Noisy Temporal Difference Error

Taisuke Kobayashi

arXiv:2604.01613·cs.LG·April 3, 2026

Pseudo-Quantized Actor-Critic Algorithm for Robustness to Noisy Temporal Difference Error

Taisuke Kobayashi

PDF

TL;DR

This paper introduces a novel reinforcement learning algorithm that enhances robustness to noisy temporal difference errors by pseudo-quantizing TD errors, leading to more stable learning without heavy heuristics.

Contribution

It proposes a control-as-inference based algorithm that decomposes optimality, enabling pseudo-quantization of TD errors for improved noise robustness in RL.

Findings

01

Demonstrates stable learning in benchmarks with noisy rewards.

02

Reduces reliance on heuristics like target networks and ensembles.

03

Achieves robustness without increased computational cost.

Abstract

In reinforcement learning (RL), temporal difference (TD) errors are widely adopted for optimizing value and policy functions. However, since the TD error is defined by a bootstrap method, its computation tends to be noisy and destabilize learning. Heuristics to improve the accuracy of TD errors, such as target networks and ensemble models, have been introduced so far. While these are essential approaches for the current deep RL algorithms, they cause side effects like increased computational cost and reduced learning efficiency. Therefore, this paper revisits the TD learning algorithm based on control as inference, deriving a novel algorithm capable of robust learning against noisy TD errors. First, the distribution model of optimality, a binary random variable, is represented by a sigmoid function. Alongside forward and reverse Kullback-Leibler divergences, this new model derives a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.