TL;DR
This paper introduces a neural network framework trained on naturalistic multi-view data that predicts human-like 3D shape perception without task-specific training, matching human accuracy and behavior patterns.
Contribution
The authors develop a novel multi-view learning model that predicts human 3D shape inference from natural images, demonstrating emergence of human-level perception without specialized biases.
Findings
Model matches human accuracy on 3D shape inference tasks.
Model responses predict human error patterns and reaction times.
First framework to achieve human-level 3D perception from naturalistic data.
Abstract
Humans can infer the three-dimensional structure of objects from two-dimensional visual inputs. Modeling this ability has been a longstanding goal for the science and engineering of visual intelligence, yet decades of computational methods have fallen short of human performance. Here we develop a modeling framework that predicts human 3D shape inferences for arbitrary objects, directly from experimental stimuli. We achieve this with a novel class of neural networks trained using a visual-spatial objective over naturalistic sensory data; given a set of images taken from different locations within a natural scene, these models learn to predict spatial information related to these images, such as camera location and visual depth, without relying on any object-related inductive biases. Notably, these visual-spatial signals are analogous to sensory cues readily available to humans. We design…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
