Stage-Adaptive Reliability Modeling for Continuous Valence-Arousal Estimation
Yubeen Lee, Sangeun Lee, Junyeop Cha, Eunil Park

TL;DR
This paper introduces SAGE, a framework that adaptively models and calibrates modality reliability during continuous valence-arousal estimation, improving robustness in real-world multimodal emotion recognition.
Contribution
SAGE is the first to explicitly estimate and incorporate modality reliability at different interaction stages for continuous affect prediction.
Findings
SAGE outperforms existing methods on the Aff-Wild2 benchmark.
Reliability-aware fusion improves stability under noise and occlusion.
Stage-adaptive modeling enhances continuous emotion estimation accuracy.
Abstract
Continuous valence-arousal estimation in real-world environments is challenging due to inconsistent modality reliability and interaction-dependent variability in audio-visual signals. Existing approaches primarily focus on modeling temporal dynamics, often overlooking the fact that modality reliability can vary substantially across interaction stages. To address this issue, we propose SAGE, a Stage-Adaptive reliability modeling framework that explicitly estimates and calibrates modality-wise confidence during multimodal integration. SAGE introduces a reliability-aware fusion mechanism that dynamically rebalances audio and visual representations according to their stage-dependent informativeness, preventing unreliable signals from dominating the prediction process. By separating reliability estimation from feature representation, the proposed framework enables more stable emotion…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Speech and Audio Processing · Music and Audio Processing
