Visual Question Answering (VQA) on Images with Superimposed Text
Venkat Kodali, Daniel Berleant

TL;DR
This paper investigates the impact of superimposed text on medical images for Visual Question Answering (VQA), demonstrating that such textual annotations can be incorporated without significantly impairing VQA performance, thus supporting their use in medical AI applications.
Contribution
The study provides empirical evidence that superimposed text on medical images does not severely affect VQA accuracy, validating its use in healthcare image analysis.
Findings
Superimposed text does not significantly degrade VQA performance.
Text annotations can be used effectively in medical image VQA.
Supports the integration of textual meta-information in medical AI systems.
Abstract
Superimposed text annotations have been under-investigated, yet are ubiquitous, useful and important, especially in medical images. Medical images also highlight the challenges posed by low resolution, noise and superimposed textual meta-information. Therefor we probed the impact of superimposing text onto medical images on VQA. Our results revealed that this textual meta-information can be added without severely degrading key measures of VQA performance. Our findings are significant because they validate the practice of superimposing text on images, even for medical images subjected to the VQA task using AI techniques. The work helps advance understanding of VQA in general and, in particular, in the domain of healthcare and medicine.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Image Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques
