TL;DR
This paper presents a multimodal model that detects repair initiation in Dutch dialogues by combining linguistic and prosodic features, improving conversational agent understanding and reducing breakdowns.
Contribution
It introduces a novel multimodal approach grounded in Conversation Analysis to detect repair requests, integrating prosodic cues with linguistic features for better accuracy.
Findings
Prosodic cues complement linguistic features in detection.
Significant improvement using combined features over text-only models.
Insights into feature interactions for dialogue understanding.
Abstract
Maintaining mutual understanding is a key component in human-human conversation to avoid conversation breakdowns, in which repair, particularly Other-Initiated Repair (OIR, when one speaker signals trouble and prompts the other to resolve), plays a vital role. However, Conversational Agents (CAs) still fail to recognize user repair initiation, leading to breakdowns or disengagement. This work proposes a multimodal model to automatically detect repair initiation in Dutch dialogues by integrating linguistic and prosodic features grounded in Conversation Analysis. The results show that prosodic cues complement linguistic features and significantly improve the results of pretrained text and audio embeddings, offering insights into how different features interact. Future directions include incorporating visual cues, exploring multilingual and cross-context corpora to assess the robustness…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
