When Pigs Fly: Contextual Reasoning in Synthetic and Natural Scenes
Philipp Bomatter, Mengmi Zhang, Dimitar Karev, Spandan Madan, Claire, Tseng, Gabriel Kreiman

TL;DR
This paper introduces a synthetic dataset and a context-aware transformer model to study and improve scene recognition under out-of-context conditions, bridging the gap between human and machine perception.
Contribution
The work presents a controllable synthetic dataset for scene context, along with a novel transformer-based model that enhances out-of-context recognition performance.
Findings
Human benchmark established for out-of-context recognition.
Proposed model achieves human-level performance on the dataset.
Model shows improved robustness over baseline models.
Abstract
Context is of fundamental importance to both human and machine vision; e.g., an object in the air is more likely to be an airplane than a pig. The rich notion of context incorporates several aspects including physics rules, statistical co-occurrences, and relative object sizes, among others. While previous work has focused on crowd-sourced out-of-context photographs from the web to study scene context, controlling the nature and extent of contextual violations has been a daunting task. Here we introduce a diverse, synthetic Out-of-Context Dataset (OCD) with fine-grained control over scene context. By leveraging a 3D simulation engine, we systematically control the gravity, object co-occurrences and relative sizes across 36 object categories in a virtual household environment. We conducted a series of experiments to gain insights into the impact of contextual cues on both human and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
