Multi-State TD Target for Model-Free Reinforcement Learning
Wuhao Wang, Zhiyong Chen, and Lepeng Zhang

TL;DR
This paper introduces a multi-state TD target in reinforcement learning that leverages multiple subsequent states for improved value estimation, leading to enhanced learning performance in actor-critic algorithms.
Contribution
It proposes the novel multi-state TD (MSTD) target and integrates it into actor-critic algorithms with replay buffer management, improving over traditional single-state TD methods.
Findings
MSTD significantly outperforms traditional TD in learning speed.
Algorithms with MSTD show improved stability and convergence.
Experimental results validate the effectiveness of the proposed method.
Abstract
Temporal difference (TD) learning is a fundamental technique in reinforcement learning that updates value estimates for states or state-action pairs using a TD target. This target represents an improved estimate of the true value by incorporating both immediate rewards and the estimated value of subsequent states. Traditionally, TD learning relies on the value of a single subsequent state. We propose an enhanced multi-state TD (MSTD) target that utilizes the estimated values of multiple subsequent states. Building on this new MSTD concept, we develop complete actor-critic algorithms that include management of replay buffers in two modes, and integrate with deep deterministic policy optimization (DDPG) and soft actor-critic (SAC). Experimental results demonstrate that algorithms employing the MSTD target significantly improve learning performance compared to traditional methods.The code…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
