SoftCTRL: Soft conservative KL-control of Transformer Reinforcement Learning for Autonomous Driving
Minh Tri Huynh, Duc Dung Nguyen

TL;DR
This paper introduces SoftCTRL, a novel approach combining imitation learning and reinforcement learning with an entropy-KL control to improve safety, robustness, and human-like behavior in autonomous driving scenarios.
Contribution
It proposes an implicit entropy-KL control method that reduces over-conservation in IL-RL integration, enhancing robustness and driving behavior fidelity.
Findings
Over 17% reduction in failures compared to baseline methods.
Significant improvement in robustness across unseen urban scenarios.
Generated driving behavior closely mimics human drivers.
Abstract
In recent years, motion planning for urban self-driving cars (SDV) has become a popular problem due to its complex interaction of road components. To tackle this, many methods have relied on large-scale, human-sampled data processed through Imitation learning (IL). Although effective, IL alone cannot adequately handle safety and reliability concerns. Combining IL with Reinforcement learning (RL) by adding KL divergence between RL and IL policy to the RL loss can alleviate IL's weakness but suffer from over-conservation caused by covariate shift of IL. To address this limitation, we introduce a method that combines IL with RL using an implicit entropy-KL control that offers a simple way to reduce the over-conservation characteristic. In particular, we validate different challenging simulated urban scenarios from the unseen dataset, indicating that although IL can perform well in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsElectric and Hybrid Vehicle Technologies · Elevator Systems and Control · Vehicle Dynamics and Control Systems
