Medico 2025: Visual Question Answering for Gastrointestinal Imaging

Sushant Gautam; Vajira Thambawita; Michael Riegler; P{\aa}l Halvorsen; and Steven Hicks

arXiv:2508.10869·cs.CV·August 15, 2025

Medico 2025: Visual Question Answering for Gastrointestinal Imaging

Sushant Gautam, Vajira Thambawita, Michael Riegler, P{\aa}l Halvorsen, and Steven Hicks

PDF

TL;DR

The Medico 2025 challenge advances VQA for GI imaging by developing explainable AI models that answer complex questions and provide interpretable medical justifications, using a large annotated dataset.

Contribution

It introduces a new benchmark dataset and challenge for explainable VQA in gastrointestinal imaging, promoting trustworthy AI in clinical decision support.

Findings

01

Development of models with improved accuracy on GI VQA tasks

02

Generation of clinically relevant explanations alongside answers

03

Establishment of a comprehensive dataset for GI VQA benchmarking

Abstract

The Medico 2025 challenge addresses Visual Question Answering (VQA) for Gastrointestinal (GI) imaging, organized as part of the MediaEval task series. The challenge focuses on developing Explainable Artificial Intelligence (XAI) models that answer clinically relevant questions based on GI endoscopy images while providing interpretable justifications aligned with medical reasoning. It introduces two subtasks: (1) answering diverse types of visual questions using the Kvasir-VQA-x1 dataset, and (2) generating multimodal explanations to support clinical decision-making. The Kvasir-VQA-x1 dataset, created from 6,500 images and 159,549 complex question-answer (QA) pairs, serves as the benchmark for the challenge. By combining quantitative performance metrics and expert-reviewed explainability assessments, this task aims to advance trustworthy Artificial Intelligence (AI) in medical image…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.