Hierarchical Deep Multi-modal Network for Medical Visual Question   Answering

Deepak Gupta; Swati Suman; Asif Ekbal

arXiv:2009.12770·cs.CL·September 29, 2020

Hierarchical Deep Multi-modal Network for Medical Visual Question Answering

Deepak Gupta, Swati Suman, Asif Ekbal

PDF

1 Repo

TL;DR

This paper introduces a hierarchical deep multi-modal network for medical visual question answering that classifies question types to improve answer accuracy, outperforming baseline models on benchmark datasets.

Contribution

The paper proposes a novel question segregation technique integrated into a hierarchical neural network for medical VQA, enhancing answer relevance and accuracy.

Findings

01

Outperforms baseline models on RAD and CLEF18 datasets

02

Question segregation improves answer accuracy

03

Detailed analysis of errors and solutions

Abstract

Visual Question Answering in Medical domain (VQA-Med) plays an important role in providing medical assistance to the end-users. These users are expected to raise either a straightforward question with a Yes/No answer or a challenging question that requires a detailed and descriptive answer. The existing techniques in VQA-Med fail to distinguish between the different question types sometimes complicates the simpler problems, or over-simplifies the complicated ones. It is certainly true that for different question types, several distinct systems can lead to confusion and discomfort for the end-users. To address this issue, we propose a hierarchical deep multi-modal network that analyzes and classifies end-user questions/queries and then incorporates a query-specific approach for answer prediction. We refer our proposed approach as Hierarchical Question Segregation based Visual Question…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Swati17293/HQS
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.