Exploring Weaknesses of VQA Models through Attribution Driven Insights
Shaunak Halbe

TL;DR
This paper investigates the vulnerabilities of Visual Question Answering models by analyzing attribution-based input influence, revealing weaknesses that can be exploited through adversarial attacks to improve model robustness for real-world applications.
Contribution
The paper introduces attribution-driven analysis of VQA models and demonstrates how to craft adversarial attacks based on these insights to enhance model robustness.
Findings
VQA models are sensitive to input variations.
Attribution analysis reveals key input features influencing predictions.
Adversarial attacks can significantly degrade VQA performance with minimal input changes.
Abstract
Deep Neural Networks have been successfully used for the task of Visual Question Answering for the past few years owing to the availability of relevant large scale datasets. However these datasets are created in artificial settings and rarely reflect the real world scenario. Recent research effectively applies these VQA models for answering visual questions for the blind. Despite achieving high accuracy these models appear to be susceptible to variation in input questions.We analyze popular VQA models through the lens of attribution (input's influence on predictions) to gain valuable insights. Further, We use these insights to craft adversarial attacks which inflict significant damage to these systems with negligible change in meaning of the input questions. We believe this will enhance development of systems more robust to the possible variations in inputs when deployed to assist the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
