Hybrid Differential Reward: Combining Temporal Difference and Action Gradients for Efficient Multi-Agent Reinforcement Learning in Cooperative Driving

Ye Han; Lijun Zhang; Dejian Meng; Zhuang Zhang

arXiv:2511.16916·cs.AI·November 24, 2025

Hybrid Differential Reward: Combining Temporal Difference and Action Gradients for Efficient Multi-Agent Reinforcement Learning in Cooperative Driving

Ye Han, Lijun Zhang, Dejian Meng, Zhuang Zhang

PDF

Open Access

TL;DR

This paper introduces a Hybrid Differential Reward mechanism that combines temporal difference and action gradient signals to improve convergence and policy quality in multi-agent cooperative driving tasks with continuous control.

Contribution

It proposes a novel HDR framework integrating TRD and ARG components, addressing reward signal issues in high-frequency continuous multi-agent environments.

Findings

01

HDR improves convergence speed in cooperative driving tasks

02

HDR enhances policy stability and safety

03

Experimental results validate effectiveness across multiple algorithms

Abstract

In multi-vehicle cooperative driving tasks involving high-frequency continuous control, traditional state-based reward functions suffer from the issue of vanishing reward differences. This phenomenon results in a low signal-to-noise ratio (SNR) for policy gradients, significantly hindering algorithm convergence and performance improvement. To address this challenge, this paper proposes a novel Hybrid Differential Reward (HDR) mechanism. We first theoretically elucidate how the temporal quasi-steady nature of traffic states and the physical proximity of actions lead to the failure of traditional reward signals. Building on this analysis, the HDR framework innovatively integrates two complementary components: (1) a Temporal Difference Reward (TRD) based on a global potential function, which utilizes the evolutionary trend of potential energy to ensure optimal policy invariance and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTraffic control and management · Autonomous Vehicle Technology and Safety · Reinforcement Learning in Robotics