Attentive Contextual Carryover for Multi-Turn End-to-End Spoken Language Understanding
Kai Wei, Thanh Tran, Feng-Ju Chang, Kanthashree Mysore Sathyendra,, Thejaswi Muniyappa, Jing Liu, Anirudh Raju, Ross McGowan, Nathan Susanj,, Ariya Rastrow, Grant P. Strimel

TL;DR
This paper introduces a multi-turn contextual E2E SLU model using multi-head attention over dialogue history, significantly reducing error rates in spoken language understanding tasks.
Contribution
It proposes a novel attention-based architecture that effectively incorporates dialogue context into end-to-end SLU models, enhancing multi-turn understanding performance.
Findings
Reduces word error rate by 10.8% on a large dataset.
Decreases semantic error rate by 12.6%.
Improves performance over noncontextual baselines.
Abstract
Recent years have seen significant advances in end-to-end (E2E) spoken language understanding (SLU) systems, which directly predict intents and slots from spoken audio. While dialogue history has been exploited to improve conventional text-based natural language understanding systems, current E2E SLU approaches have not yet incorporated such critical contextual signals in multi-turn and task-oriented dialogues. In this work, we propose a contextual E2E SLU model architecture that uses a multi-head attention mechanism over encoded previous utterances and dialogue acts (actions taken by the voice assistant) of a multi-turn dialogue. We detail alternative methods to integrate these contexts into the state-ofthe-art recurrent and transformer-based models. When applied to a large de-identified dataset of utterances collected by a voice assistant, our method reduces average word and semantic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Speech and dialogue systems · Natural Language Processing Techniques
MethodsSoftmax · Linear Layer
