Distributional Soft Actor-Critic with Harmonic Gradient for Safe and Efficient Autonomous Driving in Multi-lane Scenarios

Feihong Zhang; Guojian Zhan; Bin Shuai; Tianyi Zhang; Jingliang Duan; Shengbo Eben Li

arXiv:2505.13532·cs.RO·May 21, 2025

Distributional Soft Actor-Critic with Harmonic Gradient for Safe and Efficient Autonomous Driving in Multi-lane Scenarios

Feihong Zhang, Guojian Zhan, Bin Shuai, Tianyi Zhang, Jingliang Duan, Shengbo Eben Li

PDF

Open Access

TL;DR

This paper introduces DSAC-H, a safe reinforcement learning algorithm for autonomous driving that balances efficiency and safety constraints using harmonic policy iteration, showing near-zero safety violations in multi-lane simulations.

Contribution

The paper proposes harmonic policy iteration (HPI) to effectively balance safety and efficiency in RL, integrating it with DSAC to create a new safe RL algorithm for autonomous driving.

Findings

01

DSAC-H achieves efficient driving with minimal safety violations.

02

HPI effectively balances conflicting gradients for safety and efficiency.

03

Extensive simulations validate the effectiveness of DSAC-H in multi-lane scenarios.

Abstract

Reinforcement learning (RL), known for its self-evolution capability, offers a promising approach to training high-level autonomous driving systems. However, handling constraints remains a significant challenge for existing RL algorithms, particularly in real-world applications. In this paper, we propose a new safety-oriented training technique called harmonic policy iteration (HPI). At each RL iteration, it first calculates two policy gradients associated with efficient driving and safety constraints, respectively. Then, a harmonic gradient is derived for policy updating, minimizing conflicts between the two gradients and consequently enabling a more balanced and stable training process. Furthermore, we adopt the state-of-the-art DSAC algorithm as the backbone and integrate it with our HPI to develop a new safe RL algorithm, DSAC-H. Extensive simulations in multi-lane scenarios…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Autonomous Vehicle Technology and Safety · Adversarial Robustness in Machine Learning

MethodsADaptive gradient method with the OPTimal convergence rate