Transformation Driven Visual Reasoning

Xin Hong; Yanyan Lan; Liang Pang; Jiafeng Guo; Xueqi Cheng

arXiv:2011.13160·cs.CV·April 5, 2021

Transformation Driven Visual Reasoning

Xin Hong, Yanyan Lan, Liang Pang, Jiafeng Guo, Xueqi Cheng

PDF

Open Access 1 Repo

TL;DR

This paper introduces a transformation driven visual reasoning paradigm and dataset, emphasizing the importance of inferring dynamic changes between states, which challenges current static reasoning models.

Contribution

It proposes a new reasoning task and dataset focusing on transformations between states, extending beyond static concept understanding in visual reasoning.

Findings

01

Models excel on single-step transformations but struggle with multi-step and view-invariant transformations.

02

The dataset TRANCE enables evaluation of dynamic reasoning capabilities.

03

Current models are far from human-level performance on complex transformation tasks.

Abstract

This paper defines a new visual reasoning paradigm by introducing an important factor, i.e.~transformation. The motivation comes from the fact that most existing visual reasoning tasks, such as CLEVR in VQA, are solely defined to test how well the machine understands the concepts and relations within static settings, like one image. We argue that this kind of \textbf{state driven visual reasoning} approach has limitations in reflecting whether the machine has the ability to infer the dynamics between different states, which has been shown as important as state-level reasoning for human cognition in Piaget's theory. To tackle this problem, we propose a novel \textbf{transformation driven visual reasoning} task. Given both the initial and final states, the target is to infer the corresponding single-step or multi-step transformation, represented as a triplet (object, attribute, value) or…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hughplay/TVR
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning