LaPA: Latent Prompt Assist Model For Medical Visual Question Answering
Tiancheng Gu, Kaicheng Yang, Dongnan Liu, Weidong Cai

TL;DR
This paper introduces LaPA, a novel model for medical visual question answering that leverages latent prompts and clinical knowledge fusion to improve answer accuracy on medical image datasets.
Contribution
The paper proposes a latent prompt generation and multi-modal fusion approach that enhances clinical information extraction in Med-VQA tasks, outperforming existing models.
Findings
LaPA achieves higher accuracy than state-of-the-art models on three Med-VQA datasets.
The latent prompt module effectively captures target answers and clinical relevance.
Incorporating prior knowledge improves the model's interpretability and performance.
Abstract
Medical visual question answering (Med-VQA) aims to automate the prediction of correct answers for medical images and questions, thereby assisting physicians in reducing repetitive tasks and alleviating their workload. Existing approaches primarily focus on pre-training models using additional and comprehensive datasets, followed by fine-tuning to enhance performance in downstream tasks. However, there is also significant value in exploring existing models to extract clinically relevant information. In this paper, we propose the Latent Prompt Assist model (LaPA) for medical visual question answering. Firstly, we design a latent prompt generation module to generate the latent prompt with the constraint of the target answer. Subsequently, we propose a multi-modal fusion block with latent prompt fusion module that utilizes the latent prompt to extract clinical-relevant information from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Image Retrieval and Classification Techniques · Text and Document Classification Technologies
MethodsFocus
