COVR: A test-bed for Visually Grounded Compositional Generalization with   real images

Ben Bogin; Shivanshu Gupta; Matt Gardner; Jonathan Berant

arXiv:2109.10613·cs.CL·September 23, 2021

COVR: A test-bed for Visually Grounded Compositional Generalization with real images

Ben Bogin, Shivanshu Gupta, Matt Gardner, Jonathan Berant

PDF

Open Access 1 Repo

TL;DR

COVR introduces a new real-image benchmark for evaluating compositional generalization in visual reasoning, highlighting challenges faced by current models in zero- and few-shot scenarios with complex reasoning tasks.

Contribution

This work presents COVR, a novel, automatically generated dataset for real-image visual reasoning, enabling systematic evaluation of compositional generalization in models.

Findings

01

State-of-the-art models struggle with compositional generalization on COVR.

02

COVR enables creation of diverse, challenging test splits for compositional reasoning.

03

Models show limited zero- and few-shot learning capabilities on complex visual reasoning tasks.

Abstract

While interest in models that generalize at test time to new compositions has risen in recent years, benchmarks in the visually-grounded domain have thus far been restricted to synthetic images. In this work, we propose COVR, a new test-bed for visually-grounded compositional generalization with real images. To create COVR, we use real images annotated with scene graphs, and propose an almost fully automatic procedure for generating question-answer pairs along with a set of context images. COVR focuses on questions that require complex reasoning, including higher-order operations such as quantification and aggregation. Due to the automatic generation process, COVR facilitates the creation of compositional splits, where models at test time need to generalize to new concepts and compositions in a zero- or few-shot setting. We construct compositional splits using COVR and demonstrate a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

benbogin/covr-dataset
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Human Pose and Action Recognition

MethodsTest