Exploring the Underwater World Segmentation without Extra Training

Bingyu Li; Tao Huo; Da Zhang; Zhiyuan Zhao; Junyu Gao; Xuelong Li

arXiv:2511.07923·cs.CV·March 18, 2026

Exploring the Underwater World Segmentation without Extra Training

Bingyu Li, Tao Huo, Da Zhang, Zhiyuan Zhao, Junyu Gao, Xuelong Li

PDF

Open Access

TL;DR

This paper introduces AquaOV255, a large underwater segmentation dataset, and UOVSBench, a benchmark for open-vocabulary underwater segmentation, along with Earth2Ocean, a training-free model that transfers terrestrial vision-language models to underwater scenes.

Contribution

The paper presents the first large-scale underwater segmentation dataset, a comprehensive benchmark, and a novel training-free model for underwater segmentation using terrestrial models.

Findings

01

Earth2Ocean improves segmentation performance on underwater datasets.

02

AquaOV255 covers 255 categories with over 20K images.

03

UOVSBench enables comprehensive evaluation of open-vocabulary underwater segmentation.

Abstract

Accurate segmentation of marine organisms is vital for biodiversity monitoring and ecological assessment, yet existing datasets and models remain largely limited to terrestrial scenes. To bridge this gap, we introduce \textbf{AquaOV255}, the first large-scale and fine-grained underwater segmentation dataset containing 255 categories and over 20K images, covering diverse categories for open-vocabulary (OV) evaluation. Furthermore, we establish the first underwater OV segmentation benchmark, \textbf{UOVSBench}, by integrating AquaOV255 with five additional underwater datasets to enable comprehensive evaluation. Alongside, we present \textbf{Earth2Ocean}, a training-free OV segmentation framework that transfers terrestrial vision--language models (VLMs) to underwater domains without any additional underwater training. Earth2Ocean consists of two core components: a Geometric-guided Visual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning