CORAL: COntextual Reasoning And Local Planning in A Hierarchical VLM Framework for Underwater Monitoring
Zhenqi Wu, Yuanjie Lu, Xuesu Xiao, Xiaomin Lin

TL;DR
CORAL is a hierarchical framework that combines semantic reasoning and reactive control to improve autonomous underwater exploration, significantly increasing coverage and safety while reducing reliance on costly VLM inferences.
Contribution
It introduces a decoupled approach that separates high-level semantic guidance from low-level control, addressing limitations of end-to-end VLM systems in underwater AUV navigation.
Findings
Coverage increased by 14.28 percentage points
Collisions reduced to zero
VLM calls reduced by 57%
Abstract
Oyster reefs are critical ecosystem species that sustain biodiversity, filter water, and protect coastlines, yet they continue to decline globally. Restoring these ecosystems requires regular underwater monitoring to assess reef health, a task that remains costly, hazardous, and limited when performed by human divers. Autonomous underwater vehicles (AUVs) offer a promising alternative, but existing AUVs rely on geometry-based navigation that cannot interpret scene semantics. Recent vision-language models (VLMs) enable semantic reasoning for intelligent exploration, but existing VLM-driven systems adopt an end-to-end paradigm, introducing three key limitations. First, these systems require the VLM to generate every navigation decision, forcing frequent waits for inference. Second, VLMs cannot model robot dynamics, causing collisions in cluttered environments. Third, limited…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Underwater Vehicles and Communication Systems · Robotic Path Planning Algorithms
