Paralinguistic Emotion-Aware Validation Timing Detection in Japanese Empathetic Spoken Dialogue
Zi Haur Pang, Yahui Fu, Yuan Gao, Tatsuya Kawahara

TL;DR
This paper presents a speech-based model that detects optimal validation timing in empathetic spoken dialogue by integrating paralinguistic and emotional cues, enhancing human-robot interaction.
Contribution
It introduces a novel speech-only approach using self-supervised learning and multi-task emotion classification for validation timing detection without textual data.
Findings
Significant improvement over baseline models in validation timing detection
Effective fusion of paralinguistic and emotional speech features
Non-linguistic cues are sufficient for emotional validation detection
Abstract
Emotional Validation is a psychotherapy communication technique that involves recognizing, understanding, and explicitly acknowledging another person's feelings and actions, which strengthens alliance and reduces negative affect. To maximize the emotional support provided by validation, it is crucial to deliver it with appropriate timing and frequency. This study investigates validation timing detection from the speech perspective. Leveraging both paralinguistic and emotional information, we propose a paralinguistic- and emotion-aware model for validation timing detection without relying on textual context. Specifically, we first conduct continued self-supervised training and fine-tuning on different HuBERT backbones to obtain (i) a paralinguistics-aware Self-Supervised Learning (SSL) encoder and (ii) a multi-task speech emotion classification encoder. We then fuse these encoders and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Mental Health via Writing · Speech Recognition and Synthesis
