Conflict-Aware Multimodal Fusion for Ambivalence and Hesitancy Recognition
Salah Eddine Bekhouche, Hichem Telli, Azeddine Benlamoudi, Salah Eddine Herrouz, Abdelmalik Taleb-Ahmed, and Abdenour Hadid

TL;DR
This paper introduces ConflictAwareAH, a multimodal framework that detects ambivalence and hesitancy by analyzing disagreements across speech, face, and text signals, significantly improving recognition accuracy.
Contribution
It proposes a conflict-aware multimodal fusion method using pairwise differences to better recognize subtle affective states, addressing limitations of text-only approaches.
Findings
Improves F1-NoAH by +4.6 points over text-only models.
Halves the performance gap between classes.
Achieves state-of-the-art results on BAH dataset with 0.694 Macro F1.
Abstract
Ambivalence and hesitancy (A/H) are subtle affective states where a person shows conflicting signals through different channels -- saying one thing while their face or voice tells another story. Recognising these states automatically is valuable in clinical settings, but it is hard for machines because the key evidence lives in the \emph{disagreements} between what is said, how it sounds, and what the face shows. We present \textbf{ConflictAwareAH}, a multimodal framework built for this problem. Three pre-trained encoders extract video, audio, and text representations. Pairwise conflict features -- element-wise absolute differences between modality embeddings -- serve as \emph{bidirectional} cues: large cross-modal differences flag A/H, while small differences confirm behavioural consistency and anchor the negative class. This conflict-aware design addresses a key limitation of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Stuttering Research and Treatment · Multimodal Machine Learning Applications
