Human-level 3D shape perception emerges from multi-view learning

Tyler Bonnen; Jitendra Malik; Angjoo Kanazawa

arXiv:2602.17650·cs.CV·April 1, 2026

Human-level 3D shape perception emerges from multi-view learning

Tyler Bonnen, Jitendra Malik, Angjoo Kanazawa

PDF

1 Repo

TL;DR

This paper introduces a neural network framework trained on naturalistic multi-view data that predicts human-like 3D shape perception without task-specific training, matching human accuracy and behavior patterns.

Contribution

The authors develop a novel multi-view learning model that predicts human 3D shape inference from natural images, demonstrating emergence of human-level perception without specialized biases.

Findings

01

Model matches human accuracy on 3D shape inference tasks.

02

Model responses predict human error patterns and reaction times.

03

First framework to achieve human-level 3D perception from naturalistic data.

Abstract

Humans can infer the three-dimensional structure of objects from two-dimensional visual inputs. Modeling this ability has been a longstanding goal for the science and engineering of visual intelligence, yet decades of computational methods have fallen short of human performance. Here we develop a modeling framework that predicts human 3D shape inferences for arbitrary objects, directly from experimental stimuli. We achieve this with a novel class of neural networks trained using a visual-spatial objective over naturalistic sensory data; given a set of images taken from different locations within a natural scene, these models learn to predict spatial information related to these images, such as camera location and visual depth, without relying on any object-related inductive biases. Notably, these visual-spatial signals are analogous to sensory cues readily available to humans. We design…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://tzler.github.io/human_multiview
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.