COVR: A test-bed for Visually Grounded Compositional Generalization with real images
Ben Bogin, Shivanshu Gupta, Matt Gardner, Jonathan Berant

TL;DR
COVR introduces a new real-image benchmark for evaluating compositional generalization in visual reasoning, highlighting challenges faced by current models in zero- and few-shot scenarios with complex reasoning tasks.
Contribution
This work presents COVR, a novel, automatically generated dataset for real-image visual reasoning, enabling systematic evaluation of compositional generalization in models.
Findings
State-of-the-art models struggle with compositional generalization on COVR.
COVR enables creation of diverse, challenging test splits for compositional reasoning.
Models show limited zero- and few-shot learning capabilities on complex visual reasoning tasks.
Abstract
While interest in models that generalize at test time to new compositions has risen in recent years, benchmarks in the visually-grounded domain have thus far been restricted to synthetic images. In this work, we propose COVR, a new test-bed for visually-grounded compositional generalization with real images. To create COVR, we use real images annotated with scene graphs, and propose an almost fully automatic procedure for generating question-answer pairs along with a set of context images. COVR focuses on questions that require complex reasoning, including higher-order operations such as quantification and aggregation. Due to the automatic generation process, COVR facilitates the creation of compositional splits, where models at test time need to generalize to new concepts and compositions in a zero- or few-shot setting. We construct compositional splits using COVR and demonstrate a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Human Pose and Action Recognition
MethodsTest
