Hybrid Differential Reward: Combining Temporal Difference and Action Gradients for Efficient Multi-Agent Reinforcement Learning in Cooperative Driving
Ye Han, Lijun Zhang, Dejian Meng, Zhuang Zhang

TL;DR
This paper introduces a Hybrid Differential Reward mechanism that combines temporal difference and action gradient signals to improve convergence and policy quality in multi-agent cooperative driving tasks with continuous control.
Contribution
It proposes a novel HDR framework integrating TRD and ARG components, addressing reward signal issues in high-frequency continuous multi-agent environments.
Findings
HDR improves convergence speed in cooperative driving tasks
HDR enhances policy stability and safety
Experimental results validate effectiveness across multiple algorithms
Abstract
In multi-vehicle cooperative driving tasks involving high-frequency continuous control, traditional state-based reward functions suffer from the issue of vanishing reward differences. This phenomenon results in a low signal-to-noise ratio (SNR) for policy gradients, significantly hindering algorithm convergence and performance improvement. To address this challenge, this paper proposes a novel Hybrid Differential Reward (HDR) mechanism. We first theoretically elucidate how the temporal quasi-steady nature of traffic states and the physical proximity of actions lead to the failure of traditional reward signals. Building on this analysis, the HDR framework innovatively integrates two complementary components: (1) a Temporal Difference Reward (TRD) based on a global potential function, which utilizes the evolutionary trend of potential energy to ensure optimal policy invariance and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTraffic control and management · Autonomous Vehicle Technology and Safety · Reinforcement Learning in Robotics
