A neural prosody encoder for end-ro-end dialogue act classification
Kai Wei, Dillon Knox, Martin Radfar, Thanh Tran, Markus Muller, Grant, P. Strimel, Nathan Susanj, Athanasios Mouchtaris, Maurizio Omologo

TL;DR
This paper introduces a neural end-to-end dialogue act classification model that effectively integrates prosodic features using a learnable gating mechanism, leading to improved accuracy across multiple datasets.
Contribution
The paper presents a novel neural architecture with a learnable gating mechanism for prosody integration in end-to-end dialogue act classification.
Findings
Achieved a 1.07% absolute accuracy improvement
Effectively models prosodic phenomena at different levels
Demonstrates robustness across three benchmark datasets
Abstract
Dialogue act classification (DAC) is a critical task for spoken language understanding in dialogue systems. Prosodic features such as energy and pitch have been shown to be useful for DAC. Despite their importance, little research has explored neural approaches to integrate prosodic features into end-to-end (E2E) DAC models which infer dialogue acts directly from audio signals. In this work, we propose an E2E neural architecture that takes into account the need for characterizing prosodic phenomena co-occurring at different levels inside an utterance. A novel part of this architecture is a learnable gating mechanism that assesses the importance of prosodic features and selectively retains core information necessary for E2E DAC. Our proposed model improves DAC accuracy by 1.07% absolute across three publicly available benchmark datasets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Topic Modeling · Natural Language Processing Techniques
MethodsDynamic Algorithm Configuration
