Comparing Facial Expression Recognition in Humans and Machines: Using CAM, GradCAM, and Extremal Perturbation
Serin Park, Christian Wallraven

TL;DR
This study compares human and machine facial expression recognition performance and attention patterns, revealing humans outperform machines and Extremal Perturbation best matches human attention.
Contribution
It provides a direct comparison of human and DNN-based FER performance and attention, using explainable AI techniques and human click data.
Findings
Humans significantly outperform machines in FER accuracy.
Extremal Perturbation best matches human attention patterns.
Machine attention methods vary in similarity to human attention.
Abstract
Facial expression recognition (FER) is a topic attracting significant research in both psychology and machine learning with a wide range of applications. Despite a wealth of research on human FER and considerable progress in computational FER made possible by deep neural networks (DNNs), comparatively less work has been done on comparing the degree to which DNNs may be comparable to human performance. In this work, we compared the recognition performance and attention patterns of humans and machines during a two-alternative forced-choice FER task. Human attention was here gathered through click data that progressively uncovered a face, whereas model attention was obtained using three different popular techniques from explainable AI: CAM, GradCAM and Extremal Perturbation. In both cases, performance was gathered as percent correct. For this task, we found that humans outperformed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Face and Expression Recognition · Neural Networks and Applications
MethodsClass-activation map
