Overview of the NLPCC 2025 Shared Task 4: Multi-modal, Multilingual, and Multi-hop Medical Instructional Video Question Answering Challenge

Bin Li; Shenxi Liu; Yixuan Weng; Yue Du; Yuhang Tian; and Shoujun Zhou

arXiv:2505.06814·cs.CV·May 13, 2025

Overview of the NLPCC 2025 Shared Task 4: Multi-modal, Multilingual, and Multi-hop Medical Instructional Video Question Answering Challenge

Bin Li, Shenxi Liu, Yixuan Weng, Yue Du, Yuhang Tian, and Shoujun Zhou

PDF

Open Access

TL;DR

The paper introduces the M4IVQA challenge, a new benchmark for evaluating multi-modal, multilingual, and multi-hop question answering systems on medical instructional videos, aiming to advance healthcare AI research.

Contribution

It presents a comprehensive challenge with three tracks focusing on multi-modal, multilingual, and multi-hop reasoning in medical videos, encouraging development of advanced AI models.

Findings

01

New benchmark datasets for medical video QA

02

Baseline results demonstrating current model capabilities

03

Encourages multi-modal, multilingual, multi-hop research in healthcare AI

Abstract

Following the successful hosts of the 1-st (NLPCC 2023 Foshan) CMIVQA and the 2-rd (NLPCC 2024 Hangzhou) MMIVQA challenges, this year, a new task has been introduced to further advance research in multi-modal, multilingual, and multi-hop medical instructional question answering (M4IVQA) systems, with a specific focus on medical instructional videos. The M4IVQA challenge focuses on evaluating models that integrate information from medical instructional videos, understand multiple languages, and answer multi-hop questions requiring reasoning over various modalities. This task consists of three tracks: multi-modal, multilingual, and multi-hop Temporal Answer Grounding in Single Video (M4TAGSV), multi-modal, multilingual, and multi-hop Video Corpus Retrieval (M4VCR) and multi-modal, multilingual, and multi-hop Temporal Answer Grounding in Video Corpus (M4TAGVC). Participants in M4IVQA are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Domain Adaptation and Few-Shot Learning

MethodsFocus