Enhancing Q-Value Updates in Deep Q-Learning via Successor-State Prediction

Lipeng Zu; Hansong Zhou; Xiaonan Zhang

arXiv:2511.03836·cs.LG·November 7, 2025

Enhancing Q-Value Updates in Deep Q-Learning via Successor-State Prediction

Lipeng Zu, Hansong Zhou, Xiaonan Zhang

PDF

Open Access

TL;DR

This paper introduces SADQ, a novel deep Q-learning method that models environment dynamics to improve value update stability and efficiency, outperforming traditional DQNs in various benchmarks.

Contribution

SADQ explicitly incorporates successor-state distributions into Q-value updates, reducing variance and enhancing learning stability compared to existing methods.

Findings

01

SADQ achieves higher stability in training.

02

SADQ outperforms DQN variants in benchmark tasks.

03

SADQ demonstrates improved learning efficiency.

Abstract

Deep Q-Networks (DQNs) estimate future returns by learning from transitions sampled from a replay buffer. However, the target updates in DQN often rely on next states generated by actions from past, potentially suboptimal, policy. As a result, these states may not provide informative learning signals, causing high variance into the update process. This issue is exacerbated when the sampled transitions are poorly aligned with the agent's current policy. To address this limitation, we propose the Successor-state Aggregation Deep Q-Network (SADQ), which explicitly models environment dynamics using a stochastic transition model. SADQ integrates successor-state distributions into the Q-value estimation process, enabling more stable and policy-aligned value updates. Additionally, it explores a more efficient action selection strategy with the modeled transition structure. We provide…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAge of Information Optimization · Reinforcement Learning in Robotics · Data Stream Mining Techniques