Human face perception reflects inverse-generative and naturalistic discriminative objectives
Wenxuan Guo, Heiko H. Sch\"utt, Kamila Maria Jozwik, Katherine R. Storrs, Nikolaus Kriegeskorte, and Tal Golan

TL;DR
This study compares neural network models trained on different face perception tasks to human judgments, revealing that models focusing on invariant, high-level features best match human perception.
Contribution
It demonstrates that models trained on natural images and emphasizing invariant facial features most closely replicate human face recognition.
Findings
Models trained on natural images outperform synthetic-trained models.
High-level, invariant structure-focused models best match human judgments.
Natural image training enhances model alignment with human face perception.
Abstract
The perceptual representations supporting our ability to recognize faces remain a computational mystery. Deep neural networks offer mechanistic hypotheses for human face perception, but theoretically distinct models often make indistinguishable representational predictions for randomly sampled faces. To expose diagnostic differences among these hypotheses, we compared six neural network models sharing an architecture but trained on distinct tasks, using face pairs optimized to elicit contrasting model predictions ("controversial" pairs) alongside randomly sampled pairs. We tested model predictions against face-dissimilarity judgments from 864 human participants across stimulus sets differing in realism and pose variation. Models prioritizing high-level, invariant structures (trained via inverse rendering, face identification, or object classification) most robustly matched human…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
