Assessing the Robustness of Visual Question Answering Models
Jia-Hong Huang, Modar Alfadly, Bernard Ghanem, Marcel Worring

TL;DR
This paper introduces a new method to evaluate the robustness of Visual Question Answering models using semantically related questions as noise, along with a novel robustness measure and datasets, to better understand model resilience against input variations.
Contribution
It proposes a novel robustness evaluation method for VQA models using basic questions as noise, including a new ranking approach, robustness measure, and datasets.
Findings
The proposed method effectively analyzes VQA model robustness.
The ranking-based noise generation correlates with question similarity.
Experimental results validate the effectiveness of the evaluation approach.
Abstract
Deep neural networks have been playing an essential role in the task of Visual Question Answering (VQA). Until recently, their accuracy has been the main focus of research. Now there is a trend toward assessing the robustness of these models against adversarial attacks by evaluating the accuracy of these models under increasing levels of noisiness in the inputs of VQA models. In VQA, the attack can target the image and/or the proposed query question, dubbed main question, and yet there is a lack of proper analysis of this aspect of VQA. In this work, we propose a new method that uses semantically related questions, dubbed basic questions, acting as noise to evaluate the robustness of VQA models. We hypothesize that as the similarity of a basic question to the main question decreases, the level of noise increases. To generate a reasonable noise level for a given main question, we rank a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
