Capturing the objects of vision with neural networks
Benjamin Peters, Nikolaus Kriegeskorte

TL;DR
This paper reviews how human visual perception and deep neural networks process objects, emphasizing the importance of object representations for perception, cognition, and advancing neural network models.
Contribution
It synthesizes insights from cognitive science and deep learning to propose new experimental benchmarks for object perception in neural networks.
Findings
Human perception decomposes scenes into objects through grouping and completion.
Deep neural networks excel at object labeling but lack human-like object representations.
Proposes integrating cognitive benchmarks to improve neural network object understanding.
Abstract
Human visual perception carves a scene at its physical joints, decomposing the world into objects, which are selectively attended, tracked, and predicted as we engage our surroundings. Object representations emancipate perception from the sensory input, enabling us to keep in mind that which is out of sight and to use perceptual content as a basis for action and symbolic cognition. Human behavioral studies have documented how object representations emerge through grouping, amodal completion, proto-objects, and object files. Deep neural network (DNN) models of visual object recognition, by contrast, remain largely tethered to the sensory input, despite achieving human-level performance at labeling objects. Here, we review related work in both fields and examine how these fields can help each other. The cognitive literature provides a starting point for the development of new experimental…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
