J-EDI QA: Benchmark for deep-sea organism-specific multimodal LLM

Takero Yoshida; Yuikazu Ito; Yoshihiro Fujiwara; Shinji Tsuchida,; Daisuke Sugiyama; Daisuke Matsuoka

arXiv:2412.15574·cs.CV·December 23, 2024

J-EDI QA: Benchmark for deep-sea organism-specific multimodal LLM

Takero Yoshida, Yuikazu Ito, Yoshihiro Fujiwara, Shinji Tsuchida,, Daisuke Sugiyama, Daisuke Matsuoka

PDF

Open Access

TL;DR

This paper introduces J-EDI QA, a benchmark dataset for deep-sea organism image understanding using multimodal LLMs, revealing current models' limitations in expert-level deep-sea species comprehension.

Contribution

The paper presents a new benchmark dataset for deep-sea organism image understanding in Japanese, enabling evaluation of multimodal LLMs in this specialized domain.

Findings

01

OpenAI o1 achieved 50% accuracy on the benchmark.

02

Current models are not yet at expert-level understanding of deep-sea species.

03

The benchmark highlights the need for specialized deep-sea LLMs.

Abstract

Japan Agency for Marine-Earth Science and Technology (JAMSTEC) has made available the JAMSTEC Earth Deep-sea Image (J-EDI), a deep-sea video and image archive (https://www.godac.jamstec.go.jp/jedi/e/index.html). This archive serves as a valuable resource for researchers and scholars interested in deep-sea imagery. The dataset comprises images and videos of deep-sea phenomena, predominantly of marine organisms, but also of the seafloor and physical processes. In this study, we propose J-EDI QA, a benchmark for understanding images of deep-sea organisms using a multimodal large language model (LLM). The benchmark is comprised of 100 images, accompanied by questions and answers with four options by JAMSTEC researchers for each image. The QA pairs are provided in Japanese, and the benchmark assesses the ability to understand deep-sea species in Japanese. In the evaluation presented in this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Computational Techniques and Applications