EmoCAM: Toward Understanding What Drives CNN-based Emotion Recognition

Youssef Doulfoukar; Laurent Mertens; Joost Vennekens

arXiv:2407.14314·cs.CV·July 22, 2024

EmoCAM: Toward Understanding What Drives CNN-based Emotion Recognition

Youssef Doulfoukar, Laurent Mertens, Joost Vennekens

PDF

Open Access

TL;DR

This paper introduces EmoCAM, a framework combining CAM techniques and object detection to interpret CNN-based emotion recognition models, revealing their focus on human features and the impact of image modifications.

Contribution

It presents a novel interpretability framework for CNN-based emotion recognition, enhancing understanding of model decision cues and effects of image alterations.

Findings

01

Models focus mainly on human features.

02

Image modifications significantly influence model predictions.

03

Framework improves interpretability of emotion recognition CNNs.

Abstract

Convolutional Neural Networks are particularly suited for image analysis tasks, such as Image Classification, Object Recognition or Image Segmentation. Like all Artificial Neural Networks, however, they are "black box" models, and suffer from poor explainability. This work is concerned with the specific downstream task of Emotion Recognition from images, and proposes a framework that combines CAM-based techniques with Object Detection on a corpus level to better understand on which image cues a particular model, in our case EmoNet, relies to assign a specific emotion to an image. We demonstrate that the model mostly focuses on human characteristics, but also explore the pronounced effect of specific image modifications.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition