I'm Fine, But My Voice Isn't: Cross-Modal Affective Dissonance Detection for Reflective Journaling
Sumin Lee

TL;DR
This paper introduces a novel cross-modal affective dissonance detection framework for reflective journaling, including a new dataset, a dual-encoder model with asymmetric attention, and insights into domain gaps for real-world application.
Contribution
It formalizes cross-modal affective dissonance detection, creates a new dataset, proposes a dual-encoder model with asymmetric attention, and evaluates domain gaps for naturalistic speech analysis.
Findings
DACM achieves macro-F1 0.711 on affective dissonance detection.
Asymmetric cross-modal attention significantly improves performance.
A substantial domain gap exists between TTS-trained models and real speech.
Abstract
Digital journaling creates an authenticity gap: users consciously translate raw emotions into text, often sanitizing narratives even in private writing. We formalize this as Cross-Modal Affective Dissonance Detection (CADD), a directional three-way classification distinguishing Masking (positive text, negative acoustics), Coping (negative text, positive acoustics), and Congruent utterances, grounded in Gross's process model of emotion regulation. We present three further contributions: (i) CADD-Journal, a 1,800-sample TTS dataset with a shared-sentence-pool design that provably isolates acoustic signal from textual content; (ii) DACM, a dual-encoder model with asymmetric cross-modal attention that re-solves a gradient degeneracy in pooled fusion, achieving macro-F1 0.711 - with a four-step ablation demonstrating that asymmetric attention is the dominant driver (+ 0.242) while the DIM is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
