Multi-State TD Target for Model-Free Reinforcement Learning

Wuhao Wang; Zhiyong Chen; and Lepeng Zhang

arXiv:2405.16522·cs.LG·August 5, 2024

Multi-State TD Target for Model-Free Reinforcement Learning

Wuhao Wang, Zhiyong Chen, and Lepeng Zhang

PDF

Open Access 1 Repo

TL;DR

This paper introduces a multi-state TD target in reinforcement learning that leverages multiple subsequent states for improved value estimation, leading to enhanced learning performance in actor-critic algorithms.

Contribution

It proposes the novel multi-state TD (MSTD) target and integrates it into actor-critic algorithms with replay buffer management, improving over traditional single-state TD methods.

Findings

01

MSTD significantly outperforms traditional TD in learning speed.

02

Algorithms with MSTD show improved stability and convergence.

03

Experimental results validate the effectiveness of the proposed method.

Abstract

Temporal difference (TD) learning is a fundamental technique in reinforcement learning that updates value estimates for states or state-action pairs using a TD target. This target represents an improved estimate of the true value by incorporating both immediate rewards and the estimated value of subsequent states. Traditionally, TD learning relies on the value of a single subsequent state. We propose an enhanced multi-state TD (MSTD) target that utilizes the estimated values of multiple subsequent states. Building on this new MSTD concept, we develop complete actor-critic algorithms that include management of replay buffers in two modes, and integrate with deep deterministic policy optimization (DDPG) and soft actor-critic (SAC). Experimental results demonstrate that algorithms employing the MSTD target significantly improve learning performance compared to traditional methods.The code…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

WuhaoStatistic/MSTD-Multi-State-TD-Target-for-Model-Free-Reinforcement-Learning
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics