Label quality in AffectNet: results of crowd-based re-annotation
Doo Yon Kim, Christian Wallraven

TL;DR
This study re-annotated a subset of AffectNet with crowd-based labels and ratings, revealing differences from original labels and implications for affective computing model training.
Contribution
It provides a detailed analysis of label quality and consistency in AffectNet using crowd-based re-annotation, highlighting differences from original labels and effects on model predictions.
Findings
Crowd-based labels shift towards neutral and happy categories.
Human ratings for valence show excellent agreement.
Weakly-trained models predict crowd voting patterns better.
Abstract
AffectNet is one of the most popular resources for facial expression recognition (FER) on relatively unconstrained in-the-wild images. Given that images were annotated by only one annotator with limited consistency checks on the data, however, label quality and consistency may be limited. Here, we take a similar approach to a study that re-labeled another, smaller dataset (FER2013) with crowd-based annotations, and report results from a re-labeling and re-annotation of a subset of difficult AffectNet faces with 13 people on both expression label, and valence and arousal ratings. Our results show that human labels overall have medium to good consistency, whereas human ratings especially for valence are in excellent agreement. Importantly, however, crowd-based labels are significantly shifting towards neutral and happy categories and crowd-based affective ratings form a consistent pattern…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition
