UW-VOS: A Large-Scale Dataset for Underwater Video Object Segmentation
Hongshen Zhao, Jingkang Tai, Yuhang Wu, Wenkang Zhang, Xi Lan, Shangyan Wang, Tianyu Zhang, Wankou Yang

TL;DR
This paper introduces UW-VOS, a large-scale underwater video object segmentation dataset, and proposes SAM-U, a lightweight domain-adaptive framework that significantly improves segmentation performance in underwater environments.
Contribution
The paper presents the first large-scale underwater VOS dataset and a novel parameter-efficient adaptation framework for underwater video segmentation.
Findings
UW-VOS contains 1,431 videos and 309,295 annotations across 409 categories.
Existing methods drop in performance by 13 points on UW-VOS, highlighting domain challenges.
SAM-U achieves state-of-the-art results with only 2% trainable parameters.
Abstract
Underwater Video Object Segmentation (VOS) is essential for marine exploration, yet open-air methods suffer significant degradation due to color distortion, low contrast, and prevalent camouflage. A primary hurdle is the lack of high-quality training data. To bridge this gap, we introduce , the first large-scale underwater VOS benchmark comprising 1,431 video sequences across 409 categories with 309,295 mask annotations, constructed via a semi-automatic data engine with rigorous human verification. We further propose , a parameter-efficient framework that adapts SAM2 to the underwater domain. By inserting lightweight adapters into the image encoder, SAM-U achieves state-of-the-art performance with only 2 trainable parameters. Extensive experiments reveal that existing methods experience an average 13-point drop on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Enhancement Techniques · Advanced Neural Network Applications · Visual Attention and Saliency Detection
