Incremental processing of noisy user utterances in the spoken language understanding task
Stefan Constantin, Jan Niehues, Alex Waibel

TL;DR
This paper introduces a model-agnostic incremental processing method for spoken language understanding that reduces latency and maintains high accuracy, demonstrated on ATIS datasets with significant F1-score improvements.
Contribution
It presents a novel incremental processing approach for noisy user utterances that enhances real-time spoken language understanding systems.
Findings
Up to 47.91 percentage points F1-score improvement
Effective processing of noisy and partial utterances
Dataset creation method for low-latency NLU components
Abstract
The state-of-the-art neural network architectures make it possible to create spoken language understanding systems with high quality and fast processing time. One major challenge for real-world applications is the high latency of these systems caused by triggered actions with high executions times. If an action can be separated into subactions, the reaction time of the systems can be improved through incremental processing of the user utterance and starting subactions while the utterance is still being uttered. In this work, we present a model-agnostic method to achieve high quality in processing incrementally produced partial utterances. Based on clean and noisy versions of the ATIS dataset, we show how to create datasets with our method to create low-latency natural language understanding components. We get improvements of up to 47.91 absolute percentage points in the metric F1-score.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
