Towards Reducing the Need for Speech Training Data To Build Spoken Language Understanding Systems
Samuel Thomas, Hong-Kwang J. Kuo, Brian Kingsbury, George Saon

TL;DR
This paper introduces a novel text-based training method for spoken language understanding systems that reduces the need for large speech datasets, achieving near full performance with minimal speech data.
Contribution
A new text representation and training approach enabling end-to-end SLU systems to be built primarily from text data, with minimal speech data required for high performance.
Findings
Achieves up to 90% of full speech training performance with text-only data.
Further improves to 97% with just 10% of speech data.
Effective on multiple SLU datasets for intent and entity recognition.
Abstract
The lack of speech data annotated with labels required for spoken language understanding (SLU) is often a major hurdle in building end-to-end (E2E) systems that can directly process speech inputs. In contrast, large amounts of text data with suitable labels are usually available. In this paper, we propose a novel text representation and training methodology that allows E2E SLU systems to be effectively constructed using these text resources. With very limited amounts of additional speech, we show that these models can be further improved to perform at levels close to similar systems built on the full speech datasets. The efficacy of our proposed approach is demonstrated on both intent and entity tasks using three different SLU datasets. With text-only training, the proposed system achieves up to 90% of the performance possible with full speech training. With just an additional 10% of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech and dialogue systems · Topic Modeling
