TL;DR
This paper introduces a hierarchical deep multi-modal network for medical visual question answering that classifies question types to improve answer accuracy, outperforming baseline models on benchmark datasets.
Contribution
The paper proposes a novel question segregation technique integrated into a hierarchical neural network for medical VQA, enhancing answer relevance and accuracy.
Findings
Outperforms baseline models on RAD and CLEF18 datasets
Question segregation improves answer accuracy
Detailed analysis of errors and solutions
Abstract
Visual Question Answering in Medical domain (VQA-Med) plays an important role in providing medical assistance to the end-users. These users are expected to raise either a straightforward question with a Yes/No answer or a challenging question that requires a detailed and descriptive answer. The existing techniques in VQA-Med fail to distinguish between the different question types sometimes complicates the simpler problems, or over-simplifies the complicated ones. It is certainly true that for different question types, several distinct systems can lead to confusion and discomfort for the end-users. To address this issue, we propose a hierarchical deep multi-modal network that analyzes and classifies end-user questions/queries and then incorporates a query-specific approach for answer prediction. We refer our proposed approach as Hierarchical Question Segregation based Visual Question…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
