MARINER: A 3E-Driven Benchmark for Fine-Grained Perception and Complex Reasoning in Open-Water Environments

Xingming Liao; Ning Chen; Muying Shu; Yunpeng Yin; Peijian Zeng; Zhuowei Wang; Nankai Lin; Lianglun Cheng

arXiv:2604.08615·cs.CV·April 13, 2026

MARINER: A 3E-Driven Benchmark for Fine-Grained Perception and Complex Reasoning in Open-Water Environments

Xingming Liao, Ning Chen, Muying Shu, Yunpeng Yin, Peijian Zeng, Zhuowei Wang, Nankai Lin, Lianglun Cheng

PDF

1 Repo

TL;DR

MARINER is a new comprehensive benchmark for fine-grained perception and reasoning in open-water environments, designed to evaluate and advance maritime multimodal understanding.

Contribution

It introduces the MARINER benchmark with diverse maritime images, tasks, and evaluations, filling a gap in realistic maritime vision-language research.

Findings

01

Existing models struggle with fine-grained discrimination in marine scenes.

02

MARINER enables evaluation of causal reasoning in maritime contexts.

03

The benchmark promotes future research on robust maritime vision-language models.

Abstract

Fine-grained visual understanding and high-level reasoning in real-world open-water environments remain under-explored due to the lack of dedicated benchmarks. We introduce MARINER, a comprehensive benchmark built under the novel Entity-Environment-Event (3E) paradigm. MARINER contains 16,629 multi-source maritime images with 63 fine-grained vessel categories, diverse adverse environments, and 5 typical dynamic maritime incidents, covering fine-grained classification, object detection, and visual question answering tasks. We conduct extensive evaluations on mainstream Multimodal Large language models (MLLMs) and establish baselines, revealing that even advanced models struggle with fine-grained discrimination and causal reasoning in complex marine scenes. As a dedicated maritime benchmark, MARINER fills the gap of realistic and cognitive-level evaluation for maritime multimodal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://lxixim.github.io/MARINER
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.