Critic-Free Deep Reinforcement Learning for Maritime Coverage Path Planning on Irregular Hexagonal Grids
Carlos S. Sep\'ulveda, Gonzalo A. Ruz

TL;DR
This paper introduces a critic-free deep reinforcement learning method using Transformer-based policies for efficient maritime coverage path planning on irregular grids, outperforming heuristics in success rate and path efficiency.
Contribution
It presents a novel critic-free DRL framework with a Transformer policy for maritime CPP, handling irregular areas without re-planning and achieving significant performance improvements.
Findings
Achieved 99.0% success rate in synthetic environments.
Paths are 7% shorter and have 24% fewer heading changes.
Inference runs under 50 ms per instance on a laptop GPU.
Abstract
Maritime surveillance missions, such as search and rescue and environmental monitoring, rely on the efficient allocation of sensing assets over vast and geometrically complex areas. Traditional Coverage Path Planning (CPP) approaches depend on decomposition techniques that struggle with irregular coastlines, islands, and exclusion zones, or require computationally expensive re-planning for every instance. We propose a Deep Reinforcement Learning (DRL) framework to solve CPP on hexagonal grid representations of irregular maritime areas. Unlike conventional methods, we formulate the problem as a neural combinatorial optimization task where a Transformer-based pointer policy autoregressively constructs coverage tours. To overcome the instability of value estimation in long-horizon routing problems, we implement a critic-free Group-Relative Policy Optimization (GRPO) scheme. This method…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
