Medico 2025: Visual Question Answering for Gastrointestinal Imaging
Sushant Gautam, Vajira Thambawita, Michael Riegler, P{\aa}l Halvorsen, and Steven Hicks

TL;DR
The Medico 2025 challenge advances VQA for GI imaging by developing explainable AI models that answer complex questions and provide interpretable medical justifications, using a large annotated dataset.
Contribution
It introduces a new benchmark dataset and challenge for explainable VQA in gastrointestinal imaging, promoting trustworthy AI in clinical decision support.
Findings
Development of models with improved accuracy on GI VQA tasks
Generation of clinically relevant explanations alongside answers
Establishment of a comprehensive dataset for GI VQA benchmarking
Abstract
The Medico 2025 challenge addresses Visual Question Answering (VQA) for Gastrointestinal (GI) imaging, organized as part of the MediaEval task series. The challenge focuses on developing Explainable Artificial Intelligence (XAI) models that answer clinically relevant questions based on GI endoscopy images while providing interpretable justifications aligned with medical reasoning. It introduces two subtasks: (1) answering diverse types of visual questions using the Kvasir-VQA-x1 dataset, and (2) generating multimodal explanations to support clinical decision-making. The Kvasir-VQA-x1 dataset, created from 6,500 images and 159,549 complex question-answer (QA) pairs, serves as the benchmark for the challenge. By combining quantitative performance metrics and expert-reviewed explainability assessments, this task aims to advance trustworthy Artificial Intelligence (AI) in medical image…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
