Automatic Data Expansion for Customer-care Spoken Language Understanding
Shahab Jalalvand, Andrej Ljolje, Srinivas Bangalore

TL;DR
This paper presents an efficient data expansion method for customer-care spoken language understanding that improves intent classification accuracy by leveraging n-gram features and out-of-domain data, reducing error rates significantly.
Contribution
The authors introduce a novel approach to expand in-domain data using n-grams and out-of-domain samples, enhancing SLU performance without extensive data collection.
Findings
Reduces classification error rate by 30%
Outperforms semi-supervised, TF-IDF, and embedding-based methods
Effective across diverse experimental setups
Abstract
Spoken language understanding (SLU) systems are widely used in handling of customer-care calls.A traditional SLU system consists of an acoustic model (AM) and a language model (LM) that areused to decode the utterance and a natural language understanding (NLU) model that predicts theintent. While AM can be shared across different domains, LM and NLU models need to be trainedspecifically for every new task. However, preparing enough data to train these models is prohibitivelyexpensive. In this paper, we introduce an efficient method to expand the limited in-domain data. Theprocess starts with training a preliminary NLU model based on logistic regression on the in-domaindata. Since the features are based onn= 1,2-grams, we can detect the most informative n-gramsfor each intent class. Using these n-grams, we find the samples in the out-of-domain corpus that1) contain the desired n-gram…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Topic Modeling · Natural Language Processing Techniques
MethodsAttention Model
