Overview of the NLPCC 2025 Shared Task 4: Multi-modal, Multilingual, and Multi-hop Medical Instructional Video Question Answering Challenge
Bin Li, Shenxi Liu, Yixuan Weng, Yue Du, Yuhang Tian, and Shoujun Zhou

TL;DR
The paper introduces the M4IVQA challenge, a new benchmark for evaluating multi-modal, multilingual, and multi-hop question answering systems on medical instructional videos, aiming to advance healthcare AI research.
Contribution
It presents a comprehensive challenge with three tracks focusing on multi-modal, multilingual, and multi-hop reasoning in medical videos, encouraging development of advanced AI models.
Findings
New benchmark datasets for medical video QA
Baseline results demonstrating current model capabilities
Encourages multi-modal, multilingual, multi-hop research in healthcare AI
Abstract
Following the successful hosts of the 1-st (NLPCC 2023 Foshan) CMIVQA and the 2-rd (NLPCC 2024 Hangzhou) MMIVQA challenges, this year, a new task has been introduced to further advance research in multi-modal, multilingual, and multi-hop medical instructional question answering (M4IVQA) systems, with a specific focus on medical instructional videos. The M4IVQA challenge focuses on evaluating models that integrate information from medical instructional videos, understand multiple languages, and answer multi-hop questions requiring reasoning over various modalities. This task consists of three tracks: multi-modal, multilingual, and multi-hop Temporal Answer Grounding in Single Video (M4TAGSV), multi-modal, multilingual, and multi-hop Video Corpus Retrieval (M4VCR) and multi-modal, multilingual, and multi-hop Temporal Answer Grounding in Video Corpus (M4TAGVC). Participants in M4IVQA are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Domain Adaptation and Few-Shot Learning
MethodsFocus
