MarineEval: Assessing the Marine Intelligence of Vision-Language Models

YuK-Kwan Wong; Tuan-An To; Jipeng Zhang; Ziqiang Zheng; Sai-Kit Yeung

arXiv:2512.21126·cs.CV·December 25, 2025

MarineEval: Assessing the Marine Intelligence of Vision-Language Models

YuK-Kwan Wong, Tuan-An To, Jipeng Zhang, Ziqiang Zheng, Sai-Kit Yeung

PDF

Open Access 2 Datasets

TL;DR

MarineEval introduces a large-scale marine-specific VLM dataset and benchmark, revealing current models' limitations in domain-specific marine question answering and highlighting the need for further advancements.

Contribution

This work creates the first comprehensive marine VLM dataset and benchmark, enabling evaluation of VLMs in marine domain expertise and identifying their current shortcomings.

Findings

01

Existing VLMs perform poorly on marine domain questions.

02

The dataset covers diverse tasks and capacities, ensuring broad evaluation.

03

Significant room for improvement in domain-specific VLM performance.

Abstract

We have witnessed promising progress led by large language models (LLMs) and further vision language models (VLMs) in handling various queries as a general-purpose assistant. VLMs, as a bridge to connect the visual world and language corpus, receive both visual content and various text-only user instructions to generate corresponding responses. Though great success has been achieved by VLMs in various fields, in this work, we ask whether the existing VLMs can act as domain experts, accurately answering marine questions, which require significant domain expertise and address special domain challenges/requirements. To comprehensively evaluate the effectiveness and explore the boundary of existing VLMs, we construct the first large-scale marine VLM dataset and benchmark called MarineEval, with 2,000 image-based question-answering pairs. During our dataset construction, we ensure the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Topic Modeling