Visual Question Answering From Another Perspective: CLEVR Mental   Rotation Tests

Christopher Beckham; Martin Weiss; Florian Golemo; Sina Honari; Derek; Nowrouzezahrai; Christopher Pal

arXiv:2212.01639·stat.ML·December 6, 2022

Visual Question Answering From Another Perspective: CLEVR Mental Rotation Tests

Christopher Beckham, Martin Weiss, Florian Golemo, Sina Honari, Derek, Nowrouzezahrai, Christopher Pal

PDF

Open Access 1 Repo

TL;DR

This paper introduces CLEVR-MRT, a new dataset for testing visual reasoning involving mental rotation, and evaluates neural models that infer and manipulate volumetric scene representations to answer viewpoint-based questions.

Contribution

The paper presents CLEVR-MRT, a novel dataset for mental rotation tasks, and proposes neural architectures that utilize volumetric scene representations for viewpoint reasoning.

Findings

01

Volumetric representations improve reasoning accuracy.

02

Standard methods underperform on mental rotation tasks.

03

Neural models with scene manipulation outperform baseline approaches.

Abstract

Different types of mental rotation tests have been used extensively in psychology to understand human visual reasoning and perception. Understanding what an object or visual scene would look like from another viewpoint is a challenging problem that is made even harder if it must be performed from a single image. We explore a controlled setting whereby questions are posed about the properties of a scene if that scene was observed from another viewpoint. To do this we have created a new version of the CLEVR dataset that we call CLEVR Mental Rotation Tests (CLEVR-MRT). Using CLEVR-MRT we examine standard methods, show how they fall short, then explore novel neural architectures that involve inferring volumetric representations of a scene. These volumes can be manipulated via camera-conditioned transformations to answer the question. We examine the efficacy of different model variants…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

christopher-beckham/clevr-mrt
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Robotics and Sensor-Based Localization