Learning to See the Elephant in the Room: Self-Supervised Context Reasoning in Humans and AI

Xiao Liu; Soumick Sarker; Ankur Sikarwar; Bryan Atista Kiely; Gabriel Kreiman; Zenglin Shi; Mengmi Zhang

arXiv:2211.12817·cs.CV·February 24, 2026

Learning to See the Elephant in the Room: Self-Supervised Context Reasoning in Humans and AI

Xiao Liu, Soumick Sarker, Ankur Sikarwar, Bryan Atista Kiely, Gabriel Kreiman, Zenglin Shi, Mengmi Zhang

PDF

Open Access

TL;DR

This paper investigates how humans and AI learn contextual relationships in scenes without explicit labels, demonstrating rapid human learning and introducing SeCo, a self-supervised model that effectively infers hidden objects based on context.

Contribution

The study combines psychophysics experiments with a novel self-supervised model, SeCo, to understand and replicate human-like contextual reasoning in scene understanding.

Findings

01

Humans quickly learn contextual associations without supervision.

02

SeCo outperforms existing self-supervised models in scene reasoning.

03

SeCo's predictions align closely with human behaviour.

Abstract

Humans rarely perceive objects in isolation but interpret scenes through relationships among co-occurring elements. How such contextual knowledge is acquired without explicit supervision remains unclear. Here we combine human psychophysics experiments with computational modelling to study the emergence of contextual reasoning. Participants were exposed to novel objects embedded in naturalistic scenes that followed predefined contextual rules capturing global context, local context and crowding. After viewing short training videos, participants completed a "lift-the-flap" task in which a hidden object had to be inferred from the surrounding context under variations in size, resolution and spatial arrangement. Humans rapidly learned these contextual associations without labels or feedback and generalised robustly across contextual changes. We then introduce SeCo (Self-supervised learning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Visual Attention and Saliency Detection