Consistency-preserving Visual Question Answering in Medical Imaging
Sergio Tascon-Morales, Pablo M\'arquez-Neila, Raphael Sznitman

TL;DR
This paper introduces a novel training method for medical VQA models that enhances answer consistency and accuracy by incorporating known relations between questions, demonstrated on diabetic macular edema staging.
Contribution
It proposes a new loss function and training procedure that integrate question relations to improve consistency and accuracy in medical VQA systems.
Findings
Outperforms state-of-the-art baselines in consistency and accuracy
Improves trustworthiness of medical VQA models
Validated on diabetic macular edema staging
Abstract
Visual Question Answering (VQA) models take an image and a natural-language question as input and infer the answer to the question. Recently, VQA systems in medical imaging have gained popularity thanks to potential advantages such as patient engagement and second opinions for clinicians. While most research efforts have been focused on improving architectures and overcoming data-related limitations, answer consistency has been overlooked even though it plays a critical role in establishing trustworthy models. In this work, we propose a novel loss function and corresponding training procedure that allows the inclusion of relations between questions into the training process. Specifically, we consider the case where implications between perception and reasoning questions are known a-priori. To show the benefits of our approach, we evaluate it on the clinically relevant task of Diabetic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
