Kvasir-VQA: A Text-Image Pair GI Tract Dataset
Sushant Gautam, Andrea Stor{\aa}s, Cise Midoglu, Steven A. Hicks,, Vajira Thambawita, P{\aa}l Halvorsen, Michael A. Riegler

TL;DR
Kvasir-VQA is a comprehensive, annotated dataset of GI tract images with question-answer pairs, designed to advance machine learning applications like VQA, image captioning, and object detection in medical diagnostics.
Contribution
We created and released Kvasir-VQA, a large annotated dataset for GI tract images with diverse question types, enabling new research in medical image analysis and diagnostics.
Findings
Effective training of models for VQA, captioning, and object detection tasks.
Demonstrated dataset's utility in improving GI diagnostic tools.
Provided evaluation metrics for multiple medical image analysis tasks.
Abstract
We introduce Kvasir-VQA, an extended dataset derived from the HyperKvasir and Kvasir-Instrument datasets, augmented with question-and-answer annotations to facilitate advanced machine learning tasks in Gastrointestinal (GI) diagnostics. This dataset comprises 6,500 annotated images spanning various GI tract conditions and surgical instruments, and it supports multiple question types including yes/no, choice, location, and numerical count. The dataset is intended for applications such as image captioning, Visual Question Answering (VQA), text-based generation of synthetic medical images, object detection, and classification. Our experiments demonstrate the dataset's effectiveness in training models for three selected tasks, showcasing significant applications in medical image analysis and diagnostics. We also present evaluation metrics for each task, highlighting the usability and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Handwritten Text Recognition Techniques
MethodsDiffusion
