Distributional Soft Actor-Critic with Harmonic Gradient for Safe and Efficient Autonomous Driving in Multi-lane Scenarios
Feihong Zhang, Guojian Zhan, Bin Shuai, Tianyi Zhang, Jingliang Duan, Shengbo Eben Li

TL;DR
This paper introduces DSAC-H, a safe reinforcement learning algorithm for autonomous driving that balances efficiency and safety constraints using harmonic policy iteration, showing near-zero safety violations in multi-lane simulations.
Contribution
The paper proposes harmonic policy iteration (HPI) to effectively balance safety and efficiency in RL, integrating it with DSAC to create a new safe RL algorithm for autonomous driving.
Findings
DSAC-H achieves efficient driving with minimal safety violations.
HPI effectively balances conflicting gradients for safety and efficiency.
Extensive simulations validate the effectiveness of DSAC-H in multi-lane scenarios.
Abstract
Reinforcement learning (RL), known for its self-evolution capability, offers a promising approach to training high-level autonomous driving systems. However, handling constraints remains a significant challenge for existing RL algorithms, particularly in real-world applications. In this paper, we propose a new safety-oriented training technique called harmonic policy iteration (HPI). At each RL iteration, it first calculates two policy gradients associated with efficient driving and safety constraints, respectively. Then, a harmonic gradient is derived for policy updating, minimizing conflicts between the two gradients and consequently enabling a more balanced and stable training process. Furthermore, we adopt the state-of-the-art DSAC algorithm as the backbone and integrate it with our HPI to develop a new safe RL algorithm, DSAC-H. Extensive simulations in multi-lane scenarios…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Autonomous Vehicle Technology and Safety · Adversarial Robustness in Machine Learning
MethodsADaptive gradient method with the OPTimal convergence rate
