CaveSeg: Deep Semantic Segmentation and Scene Parsing for Autonomous Underwater Cave Exploration
A. Abdullah, T. Barua, R. Tibbetts, Z. Chen, M. J. Islam, I. Rekleitis

TL;DR
CaveSeg introduces a novel deep learning pipeline with a new dataset for semantic segmentation of underwater caves, enabling autonomous AUV navigation with near real-time scene parsing and state-of-the-art accuracy.
Contribution
The paper presents the first visual learning pipeline and comprehensive dataset for underwater cave scene parsing, including a lightweight transformer model for real-time AUV navigation.
Findings
Robust deep models can be trained for underwater cave scene parsing.
The transformer-based model achieves near real-time performance.
Benchmark results demonstrate state-of-the-art accuracy.
Abstract
In this paper, we present CaveSeg - the first visual learning pipeline for semantic segmentation and scene parsing for AUV navigation inside underwater caves. We address the problem of scarce annotated training data by preparing a comprehensive dataset for semantic segmentation of underwater cave scenes. It contains pixel annotations for important navigation markers (e.g. caveline, arrows), obstacles (e.g. ground plane and overhead layers), scuba divers, and open areas for servoing. Through comprehensive benchmark analyses on cave systems in USA, Mexico, and Spain locations, we demonstrate that robust deep visual models can be developed based on CaveSeg for fast semantic scene parsing of underwater cave environments. In particular, we formulate a novel transformer-based model that is computationally light and offers near real-time execution in addition to achieving state-of-the-art…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsUnderwater Vehicles and Communication Systems · Robotics and Sensor-Based Localization · Multimodal Machine Learning Applications
