Towards Causal VQA: Revealing and Reducing Spurious Correlations by   Invariant and Covariant Semantic Editing

Vedika Agarwal; Rakshith Shetty; Mario Fritz

arXiv:1912.07538·cs.CV·June 1, 2020

Towards Causal VQA: Revealing and Reducing Spurious Correlations by Invariant and Covariant Semantic Editing

Vedika Agarwal, Rakshith Shetty, Mario Fritz

PDF

TL;DR

This paper introduces a method to analyze and improve the robustness of VQA models against semantic variations and spurious correlations by using semantic image editing and synthetic data generation, especially focusing on counting questions.

Contribution

It proposes a novel approach for measuring and enhancing VQA model robustness through automated semantic image manipulations and synthetic data generation, addressing spurious correlations.

Findings

01

Models become more robust with edited data.

02

Semantic manipulations reveal model vulnerabilities.

03

Improved performance on real-world cases.

Abstract

Despite significant success in Visual Question Answering (VQA), VQA models have been shown to be notoriously brittle to linguistic variations in the questions. Due to deficiencies in models and datasets, today's models often rely on correlations rather than predictions that are causal w.r.t. data. In this paper, we propose a novel way to analyze and measure the robustness of the state of the art models w.r.t semantic visual variations as well as propose ways to make models more robust against spurious correlations. Our method performs automated semantic image manipulations and tests for consistency in model predictions to quantify the model robustness as well as generate synthetic data to counter these problems. We perform our analysis on three diverse, state of the art VQA models and diverse question types with a particular focus on challenging counting questions. In addition, we show…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.