Capture Salient Historical Information: A Fast and Accurate Non-Autoregressive Model for Multi-turn Spoken Language Understanding
Lizhi Cheng, Weijia jia, Wenmian Yang

TL;DR
This paper introduces SHA-LRT, a novel non-autoregressive model for multi-turn spoken language understanding that captures salient historical information efficiently, significantly improving accuracy and inference speed over existing methods.
Contribution
The paper proposes SHA-LRT, a new model combining salient history attention, layer refinement, and slot label generation for fast, accurate multi-turn SLU.
Findings
Achieves 17.5% improvement in overall SLU performance
Accelerates inference nearly 15 times compared to baselines
Effective on both multi-turn and single-turn SLU tasks
Abstract
Spoken Language Understanding (SLU), a core component of the task-oriented dialogue system, expects a shorter inference facing the impatience of human users. Existing work increases inference speed by designing non-autoregressive models for single-turn SLU tasks but fails to apply to multi-turn SLU in confronting the dialogue history. The intuitive idea is to concatenate all historical utterances and utilize the non-autoregressive models directly. However, this approach seriously misses the salient historical information and suffers from the uncoordinated-slot problems. To overcome those shortcomings, we propose a novel model for multi-turn SLU named Salient History Attention with Layer-Refined Transformer (SHA-LRT), which composes of an SHA module, a Layer-Refined Mechanism (LRM), and a Slot Label Generation (SLG) task. SHA captures salient historical information for the current…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Speech and dialogue systems · Intelligent Tutoring Systems and Adaptive Learning
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Softmax · Residual Connection · Adam · Byte Pair Encoding · Layer Normalization · Absolute Position Encodings
