UW-VOS: A Large-Scale Dataset for Underwater Video Object Segmentation

Hongshen Zhao; Jingkang Tai; Yuhang Wu; Wenkang Zhang; Xi Lan; Shangyan Wang; Tianyu Zhang; Wankou Yang

arXiv:2603.24006·cs.CV·March 26, 2026

UW-VOS: A Large-Scale Dataset for Underwater Video Object Segmentation

Hongshen Zhao, Jingkang Tai, Yuhang Wu, Wenkang Zhang, Xi Lan, Shangyan Wang, Tianyu Zhang, Wankou Yang

PDF

Open Access

TL;DR

This paper introduces UW-VOS, a large-scale underwater video object segmentation dataset, and proposes SAM-U, a lightweight domain-adaptive framework that significantly improves segmentation performance in underwater environments.

Contribution

The paper presents the first large-scale underwater VOS dataset and a novel parameter-efficient adaptation framework for underwater video segmentation.

Findings

01

UW-VOS contains 1,431 videos and 309,295 annotations across 409 categories.

02

Existing methods drop in performance by 13 points on UW-VOS, highlighting domain challenges.

03

SAM-U achieves state-of-the-art results with only 2% trainable parameters.

Abstract

Underwater Video Object Segmentation (VOS) is essential for marine exploration, yet open-air methods suffer significant degradation due to color distortion, low contrast, and prevalent camouflage. A primary hurdle is the lack of high-quality training data. To bridge this gap, we introduce $UW-VOS$ , the first large-scale underwater VOS benchmark comprising 1,431 video sequences across 409 categories with 309,295 mask annotations, constructed via a semi-automatic data engine with rigorous human verification. We further propose $SAM-U$ , a parameter-efficient framework that adapts SAM2 to the underwater domain. By inserting lightweight adapters into the image encoder, SAM-U achieves state-of-the-art performance with only $\sim$ 2 $%$ trainable parameters. Extensive experiments reveal that existing methods experience an average 13-point $J & F$ drop on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Enhancement Techniques · Advanced Neural Network Applications · Visual Attention and Saliency Detection