Revisiting Marr in Face: The Building of 2D--2.5D--3D Representations in   Deep Neural Networks

Xiangyu Zhu; and Chang Yu; and Jiankuo Zhao; and Zhaoxiang Zhang; Stan; Z. Li; Zhen Lei

arXiv:2411.16148·cs.CV·November 26, 2024

Revisiting Marr in Face: The Building of 2D--2.5D--3D Representations in Deep Neural Networks

Xiangyu Zhu, and Chang Yu, and Jiankuo Zhao, and Zhaoxiang Zhang, Stan, Z. Li, Zhen Lei

PDF

Open Access

TL;DR

This paper investigates how deep neural networks process visual information in stages similar to Marr's 2D, 2.5D, and 3D vision theory, revealing a progression from 2D to 3D representations with an intermediate hybrid stage.

Contribution

It introduces a graphics probe method to analyze DNNs, providing empirical evidence that they process visual representations in stages consistent with Marr's theory.

Findings

01

DNNs encode images as 2D in low-level layers

02

High-level layers construct 3D representations

03

Mid-level layers exhibit a hybrid 2.5D state

Abstract

David Marr's seminal theory of vision proposes that the human visual system operates through a sequence of three stages, known as the 2D sketch, the 2.5D sketch, and the 3D model. In recent years, Deep Neural Networks (DNN) have been widely thought to have reached a level comparable to human vision. However, the mechanisms by which DNNs accomplish this and whether they adhere to Marr's 2D--2.5D--3D construction theory remain unexplored. In this paper, we delve into the perception task to explore these questions and find evidence supporting Marr's theory. We introduce a graphics probe, a sub-network crafted to reconstruct the original image from the network's intermediate layers. The key to the graphics probe is its flexible architecture that supports image in both 2D and 3D formats, as well as in a transitional state between them. By injecting graphics probes into neural networks, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Processing and 3D Reconstruction