UWBench: A Comprehensive Vision-Language Benchmark for Underwater Understanding

Da Zhang; Chenggang Rong; Bingyu Li; Feiyu Wang; Zhiyuan Zhao; Junyu Gao; Xuelong Li

arXiv:2510.18262·cs.CV·October 22, 2025

UWBench: A Comprehensive Vision-Language Benchmark for Underwater Understanding

Da Zhang, Chenggang Rong, Bingyu Li, Feiyu Wang, Zhiyuan Zhao, Junyu Gao, Xuelong Li

PDF

Open Access 1 Datasets

TL;DR

UWBench is a new comprehensive benchmark dataset designed to evaluate and advance vision-language models specifically for underwater environments, addressing unique challenges like light attenuation and marine ecology understanding.

Contribution

The paper introduces UWBench, a large-scale, annotated underwater dataset and benchmarks for image captioning, visual grounding, and question answering in marine settings, filling a critical gap in underwater AI research.

Findings

01

State-of-the-art models perform poorly on underwater tasks

02

Underwater understanding remains a significant challenge for current VLMs

03

UWBench enables targeted improvements in marine environment AI applications

Abstract

Large vision-language models (VLMs) have achieved remarkable success in natural scene understanding, yet their application to underwater environments remains largely unexplored. Underwater imagery presents unique challenges including severe light attenuation, color distortion, and suspended particle scattering, while requiring specialized knowledge of marine ecosystems and organism taxonomy. To bridge this gap, we introduce UWBench, a comprehensive benchmark specifically designed for underwater vision-language understanding. UWBench comprises 15,003 high-resolution underwater images captured across diverse aquatic environments, encompassing oceans, coral reefs, and deep-sea habitats. Each image is enriched with human-verified annotations including 15,281 object referring expressions that precisely describe marine organisms and underwater structures, and 124,983 question-answer pairs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

da1018/UWBench
dataset· 84 dl
84 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning