Imagine the Unseen World: A Benchmark for Systematic Generalization in   Visual World Models

Yeongbin Kim; Gautam Singh; Junyeong Park; Caglar Gulcehre; Sungjin; Ahn

arXiv:2311.09064·cs.CV·November 16, 2023·1 cites

Imagine the Unseen World: A Benchmark for Systematic Generalization in Visual World Models

Yeongbin Kim, Gautam Singh, Junyeong Park, Caglar Gulcehre, Sungjin, Ahn

PDF

Open Access 1 Video

TL;DR

This paper introduces SVIB, a new benchmark for evaluating models' ability to perform systematic visual imagination through one-step image transformations, aiming to advance compositional understanding in visual models.

Contribution

The paper presents SVIB, the first benchmark for systematic visual imagination, enabling evaluation and development of models that understand and generate dynamic visual transformations.

Findings

01

Baseline models show limited systematic generalization.

02

SVIB's difficulty levels reveal current model limitations.

03

Controllable factor combinations impact model performance.

Abstract

Systematic compositionality, or the ability to adapt to novel situations by creating a mental model of the world using reusable pieces of knowledge, remains a significant challenge in machine learning. While there has been considerable progress in the language domain, efforts towards systematic visual imagination, or envisioning the dynamical implications of a visual observation, are in their infancy. We introduce the Systematic Visual Imagination Benchmark (SVIB), the first benchmark designed to address this problem head-on. SVIB offers a novel framework for a minimal world modeling problem, where models are evaluated based on their ability to generate one-step image-to-image transformations under a latent world dynamics. The framework provides benefits such as the possibility to jointly optimize for systematic perception and imagination, a range of difficulty levels, and the ability…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Imagine the Unseen World: A Benchmark for Systematic Generalization in Visual World Models· slideslive

Taxonomy

TopicsMultimodal Machine Learning Applications