Beyond the Fold: Quantifying Split-Level Noise and the Case for Leave-One-Dataset-Out AU Evaluation

Saurabh Hinduja; Gurmeet Kaur; Maneesh Bilalpur; Jeffrey Cohn; Shaun Canavan

arXiv:2604.02162·cs.CV·April 3, 2026

Beyond the Fold: Quantifying Split-Level Noise and the Case for Leave-One-Dataset-Out AU Evaluation

Saurabh Hinduja, Gurmeet Kaur, Maneesh Bilalpur, Jeffrey Cohn, Shaun Canavan

PDF

TL;DR

This paper demonstrates that subject-exclusive cross-validation introduces stochastic variance in facial AU detection evaluation and advocates for leave-one-dataset-out validation for more stable, domain-aware assessment.

Contribution

It quantifies the noise inherent in cross-validation and highlights the benefits of LODO evaluation for assessing model robustness across datasets.

Findings

01

Cross-validation introduces a noise floor of ±0.065 in F1 scores.

02

Model rankings can change with different fold assignments.

03

LODO evaluation reveals domain-level instability not seen in standard cross-validation.

Abstract

Subject-exclusive cross-validation is the standard evaluation protocol for facial Action Unit (AU) detection, yet reported improvements are often small. We show that cross-validation itself introduces measurable stochastic variance. On BP4D+, repeated 3-fold subject-exclusive splits produce an empirical noise floor of $\pm 0.065$ in average F1, with substantially larger variation for low-prevalence AUs. Operating-point metrics such as F1 fluctuate more than threshold-independent measures such as AUC, and model ranking can change under different fold assignments. We further evaluate cross-dataset robustness using a Leave-One-Dataset-Out (LODO) protocol across five AU datasets. LODO removes partition randomness and exposes domain-level instability that is not visible under single-dataset cross-validation. Together, these results suggest that gains often reported in cross-fold validation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.