Mitigating Estimation Bias with Representation Learning in TD Error-Driven Regularization
Haohui Chen, Zhiyong Chen, Aoxiang Liu, and Wentuo Fang

TL;DR
This paper introduces a flexible bias control method in deterministic policy gradient algorithms using a double actor-critic framework with novel convex combination strategies, enhancing value estimation and exploration in continuous control tasks.
Contribution
It presents a new bias mitigation approach with tunable control via convex combinations and augmented representations, improving over existing methods in continuous control reinforcement learning.
Findings
Outperforms benchmark algorithms across various tasks.
Demonstrates effective bias control with a single hyperparameter.
Shows both overestimation and underestimation can be beneficial depending on the environment.
Abstract
Deterministic policy gradient algorithms for continuous control suffer from value estimation biases that degrade performance. While double critics reduce such biases, the exploration potential of double actors remains underexplored. Building on temporal-difference error-driven regularization (TDDR), a double actor-critic framework, this work introduces enhanced methods to achieve flexible bias control and stronger representation learning. We propose three convex combination strategies, symmetric and asymmetric, that balance pessimistic estimates to mitigate overestimation and optimistic exploration via double actors to alleviate underestimation. A single hyperparameter governs this mechanism, enabling tunable control across the bias spectrum. To further improve performance, we integrate augmented state and action representations into the actor and critic networks. Extensive experiments…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Model Reduction and Neural Networks · Adversarial Robustness in Machine Learning
