UVLM: Benchmarking Video Language Model for Underwater World Understanding

Xizhe Xue; Yang Zhou; Dawei Yan; Lijie Tao; Junjie Li; Ying Li; Haokui Zhang; Rong Xiao

arXiv:2507.02373·cs.CV·November 19, 2025

UVLM: Benchmarking Video Language Model for Underwater World Understanding

Xizhe Xue, Yang Zhou, Dawei Yan, Lijie Tao, Junjie Li, Ying Li, Haokui Zhang, Rong Xiao

PDF

TL;DR

UVLM introduces a comprehensive underwater video language benchmark to evaluate and improve the understanding of marine environments by VidLMs, addressing a gap in existing terrestrial-focused datasets.

Contribution

The paper presents UVLM, a new underwater video language benchmark with diverse data, tasks, and evaluation metrics, specifically designed for marine environment understanding.

Findings

01

Fine-tuning VidLMs on UVLM enhances underwater understanding.

02

UVLM improves performance on existing in-air VidLM benchmarks.

03

The dataset covers diverse underwater scenarios and tasks.

Abstract

Recently, the remarkable success of large language models (LLMs) has achieved a profound impact on the field of artificial intelligence. Numerous advanced works based on LLMs have been proposed and applied in various scenarios. Among them, video language models (VidLMs) are particularly widely used. However, existing works primarily focus on terrestrial scenarios, overlooking the highly demanding application needs of underwater observation. To overcome this gap, we introduce UVLM, an under water observation benchmark which is build through a collaborative approach combining human expertise and AI models. To ensure data quality, we have conducted in-depth considerations from multiple perspectives. First, to address the unique challenges of underwater environments, we selected videos that represent typical underwater challenges including light variations, water turbidity, and diverse…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.