Stage-Adaptive Reliability Modeling for Continuous Valence-Arousal Estimation

Yubeen Lee; Sangeun Lee; Junyeop Cha; Eunil Park

arXiv:2603.11468·cs.MM·March 13, 2026

Stage-Adaptive Reliability Modeling for Continuous Valence-Arousal Estimation

Yubeen Lee, Sangeun Lee, Junyeop Cha, Eunil Park

PDF

Open Access

TL;DR

This paper introduces SAGE, a framework that adaptively models and calibrates modality reliability during continuous valence-arousal estimation, improving robustness in real-world multimodal emotion recognition.

Contribution

SAGE is the first to explicitly estimate and incorporate modality reliability at different interaction stages for continuous affect prediction.

Findings

01

SAGE outperforms existing methods on the Aff-Wild2 benchmark.

02

Reliability-aware fusion improves stability under noise and occlusion.

03

Stage-adaptive modeling enhances continuous emotion estimation accuracy.

Abstract

Continuous valence-arousal estimation in real-world environments is challenging due to inconsistent modality reliability and interaction-dependent variability in audio-visual signals. Existing approaches primarily focus on modeling temporal dynamics, often overlooking the fact that modality reliability can vary substantially across interaction stages. To address this issue, we propose SAGE, a Stage-Adaptive reliability modeling framework that explicitly estimates and calibrates modality-wise confidence during multimodal integration. SAGE introduces a reliability-aware fusion mechanism that dynamically rebalances audio and visual representations according to their stage-dependent informativeness, preventing unreliable signals from dominating the prediction process. By separating reliability estimation from feature representation, the proposed framework enables more stable emotion…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition · Speech and Audio Processing · Music and Audio Processing