J-EDI QA: Benchmark for deep-sea organism-specific multimodal LLM
Takero Yoshida, Yuikazu Ito, Yoshihiro Fujiwara, Shinji Tsuchida,, Daisuke Sugiyama, Daisuke Matsuoka

TL;DR
This paper introduces J-EDI QA, a benchmark dataset for deep-sea organism image understanding using multimodal LLMs, revealing current models' limitations in expert-level deep-sea species comprehension.
Contribution
The paper presents a new benchmark dataset for deep-sea organism image understanding in Japanese, enabling evaluation of multimodal LLMs in this specialized domain.
Findings
OpenAI o1 achieved 50% accuracy on the benchmark.
Current models are not yet at expert-level understanding of deep-sea species.
The benchmark highlights the need for specialized deep-sea LLMs.
Abstract
Japan Agency for Marine-Earth Science and Technology (JAMSTEC) has made available the JAMSTEC Earth Deep-sea Image (J-EDI), a deep-sea video and image archive (https://www.godac.jamstec.go.jp/jedi/e/index.html). This archive serves as a valuable resource for researchers and scholars interested in deep-sea imagery. The dataset comprises images and videos of deep-sea phenomena, predominantly of marine organisms, but also of the seafloor and physical processes. In this study, we propose J-EDI QA, a benchmark for understanding images of deep-sea organisms using a multimodal large language model (LLM). The benchmark is comprised of 100 images, accompanied by questions and answers with four options by JAMSTEC researchers for each image. The QA pairs are provided in Japanese, and the benchmark assesses the ability to understand deep-sea species in Japanese. In the evaluation presented in this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Computational Techniques and Applications
