NAUTILUS: A Large Multimodal Model for Underwater Scene Understanding
Wei Xu, Cheng Wang, Dingkang Liang, Zongchuang Zhao, Xingyu Jiang, Peng Zhang, Xiang Bai

TL;DR
NAUTILUS is a large multimodal model designed for comprehensive underwater scene understanding, leveraging a new dataset and a feature enhancement module to improve robustness and performance in underwater exploration tasks.
Contribution
The paper introduces NautData, a large-scale underwater dataset, and a novel VFE module, advancing multi-task underwater scene understanding with improved robustness and accuracy.
Findings
VFE module enhances model performance across tasks
NAUTILUS outperforms baselines on underwater datasets
Constructed the first large-scale multi-task underwater dataset
Abstract
Underwater exploration offers critical insights into our planet and attracts increasing attention for its broader applications in resource exploration, national security, etc. We study the underwater scene understanding methods, which aim to achieve automated underwater exploration. The underwater scene understanding task demands multi-task perceptions from multiple granularities. However, the absence of large-scale underwater multi-task instruction-tuning datasets hinders the progress of this research. To bridge this gap, we construct NautData, a dataset containing 1.45 M image-text pairs supporting eight underwater scene understanding tasks. It enables the development and thorough evaluation of the underwater scene understanding models. Underwater image degradation is a widely recognized challenge that interferes with underwater tasks. To improve the robustness of underwater scene…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Enhancement Techniques · Multimodal Machine Learning Applications · Advanced Neural Network Applications
