A Comparison of Reinforcement Learning and Optimal Control Methods for Path Planning
Qiang Le, Yaguang Yang, Isaac E. Weintraub

TL;DR
This paper compares reinforcement learning and optimal control for path planning in autonomous vehicles, demonstrating that DDPG offers faster, effective paths suitable for real-time decision-making in threat environments.
Contribution
It introduces a DDPG-based path planning method modeling threat zones, and compares its performance with traditional optimal control, highlighting advantages in computational speed.
Findings
DDPG finds feasible paths faster than optimal control.
The method identifies the largest feasible starting set for safe paths.
Paths in the feasible set may not always be optimal.
Abstract
Path-planning for autonomous vehicles in threat-laden environments is a fundamental challenge. While traditional optimal control methods can find ideal paths, the computational time is often too slow for real-time decision-making. To solve this challenge, we propose a method based on Deep Deterministic Policy Gradient (DDPG) and model the threat as a simple, circular `no-go' zone. A mission failure is claimed if the vehicle enters this `no-go' zone at any time or does not reach a neighborhood of the destination. The DDPG agent is trained to learn a direct mapping from its current state (position and velocity) to a series of feasible actions that guide the agent to safely reach its goal. A reward function and two neural networks, critic and actor, are used to describe the environment and guide the control efforts. The DDPG trains the agent to find the largest possible set of starting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
