Measuring CLEVRness: Blackbox testing of Visual Reasoning Models

Spyridon Mouselinos; Henryk Michalewski; Mateusz Malinowski

arXiv:2202.12162·cs.LG·March 1, 2022·1 cites

Measuring CLEVRness: Blackbox testing of Visual Reasoning Models

Spyridon Mouselinos, Henryk Michalewski, Mateusz Malinowski

PDF

Open Access 1 Video

TL;DR

This paper introduces a behavioral testing framework for visual reasoning models using a game where an adversary reconfigures scenes to test if models truly reason or exploit dataset biases, revealing limitations in current models.

Contribution

It proposes a novel black-box testing method involving an adversarial scene reconfiguration to evaluate reasoning abilities of visual QA models.

Findings

01

CLEVR models can be easily fooled by adversarial reconfigurations

02

Current models may rely on dataset biases rather than true reasoning

03

The method provides a controlled way to measure reasoning efficiency

Abstract

How can we measure the reasoning capabilities of intelligence systems? Visual question answering provides a convenient framework for testing the model's abilities by interrogating the model through questions about the scene. However, despite scores of various visual QA datasets and architectures, which sometimes yield even a super-human performance, the question of whether those architectures can actually reason remains open to debate. To answer this, we extend the visual question answering framework and propose the following behavioral test in the form of a two-player game. We consider black-box neural models of CLEVR. These models are trained on a diagnostic dataset benchmarking reasoning. Next, we train an adversarial player that re-configures the scene to fool the CLEVR model. We show that CLEVR models, which otherwise could perform at a human level, can easily be fooled by our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Measuring CLEVRness: Black-box Testing of Visual Reasoning Models· slideslive

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Human Pose and Action Recognition