Improving Visual Question Answering Models through Robustness Analysis   and In-Context Learning with a Chain of Basic Questions

Jia-Hong Huang; Modar Alfadly; Bernard Ghanem; Marcel Worring

arXiv:2304.03147·cs.CV·April 7, 2023·1 cites

Improving Visual Question Answering Models through Robustness Analysis and In-Context Learning with a Chain of Basic Questions

Jia-Hong Huang, Modar Alfadly, Bernard Ghanem, Marcel Worring

PDF

Open Access

TL;DR

This paper introduces a new robustness evaluation method for VQA models using semantically related basic questions as noise, and shows that in-context learning with these questions can improve accuracy.

Contribution

It proposes a novel robustness measure R_score, a ranking method for basic questions via LASSO, and demonstrates the effectiveness of in-context learning with basic questions.

Findings

01

The proposed method effectively evaluates VQA model robustness.

02

In-context learning with basic questions enhances model accuracy.

03

The new robustness measure R_score provides a standardized evaluation metric.

Abstract

Deep neural networks have been critical in the task of Visual Question Answering (VQA), with research traditionally focused on improving model accuracy. Recently, however, there has been a trend towards evaluating the robustness of these models against adversarial attacks. This involves assessing the accuracy of VQA models under increasing levels of noise in the input, which can target either the image or the proposed query question, dubbed the main question. However, there is currently a lack of proper analysis of this aspect of VQA. This work proposes a new method that utilizes semantically related questions, referred to as basic questions, acting as noise to evaluate the robustness of VQA models. It is hypothesized that as the similarity of a basic question to the main question decreases, the level of noise increases. To generate a reasonable noise level for a given main question, a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning