Mitigating Estimation Bias with Representation Learning in TD Error-Driven Regularization

Haohui Chen; Zhiyong Chen; Aoxiang Liu; and Wentuo Fang

arXiv:2511.16090·cs.LG·November 21, 2025

Mitigating Estimation Bias with Representation Learning in TD Error-Driven Regularization

Haohui Chen, Zhiyong Chen, Aoxiang Liu, and Wentuo Fang

PDF

Open Access

TL;DR

This paper introduces a flexible bias control method in deterministic policy gradient algorithms using a double actor-critic framework with novel convex combination strategies, enhancing value estimation and exploration in continuous control tasks.

Contribution

It presents a new bias mitigation approach with tunable control via convex combinations and augmented representations, improving over existing methods in continuous control reinforcement learning.

Findings

01

Outperforms benchmark algorithms across various tasks.

02

Demonstrates effective bias control with a single hyperparameter.

03

Shows both overestimation and underestimation can be beneficial depending on the environment.

Abstract

Deterministic policy gradient algorithms for continuous control suffer from value estimation biases that degrade performance. While double critics reduce such biases, the exploration potential of double actors remains underexplored. Building on temporal-difference error-driven regularization (TDDR), a double actor-critic framework, this work introduces enhanced methods to achieve flexible bias control and stronger representation learning. We propose three convex combination strategies, symmetric and asymmetric, that balance pessimistic estimates to mitigate overestimation and optimistic exploration via double actors to alleviate underestimation. A single hyperparameter governs this mechanism, enabling tunable control across the bias spectrum. To further improve performance, we integrate augmented state and action representations into the actor and critic networks. Extensive experiments…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Model Reduction and Neural Networks · Adversarial Robustness in Machine Learning