Revisiting Marr in Face: The Building of 2D--2.5D--3D Representations in Deep Neural Networks
Xiangyu Zhu, and Chang Yu, and Jiankuo Zhao, and Zhaoxiang Zhang, Stan, Z. Li, Zhen Lei

TL;DR
This paper investigates how deep neural networks process visual information in stages similar to Marr's 2D, 2.5D, and 3D vision theory, revealing a progression from 2D to 3D representations with an intermediate hybrid stage.
Contribution
It introduces a graphics probe method to analyze DNNs, providing empirical evidence that they process visual representations in stages consistent with Marr's theory.
Findings
DNNs encode images as 2D in low-level layers
High-level layers construct 3D representations
Mid-level layers exhibit a hybrid 2.5D state
Abstract
David Marr's seminal theory of vision proposes that the human visual system operates through a sequence of three stages, known as the 2D sketch, the 2.5D sketch, and the 3D model. In recent years, Deep Neural Networks (DNN) have been widely thought to have reached a level comparable to human vision. However, the mechanisms by which DNNs accomplish this and whether they adhere to Marr's 2D--2.5D--3D construction theory remain unexplored. In this paper, we delve into the perception task to explore these questions and find evidence supporting Marr's theory. We introduce a graphics probe, a sub-network crafted to reconstruct the original image from the network's intermediate layers. The key to the graphics probe is its flexible architecture that supports image in both 2D and 3D formats, as well as in a transitional state between them. By injecting graphics probes into neural networks, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Processing and 3D Reconstruction
