A new data augmentation method for intent classification enhancement and its application on spoken conversation datasets
Zvi Kons, Aharon Satt, Hong-Kwang Kuo, Samuel Thomas, Boaz Carmeli,, Ron Hoory, Brian Kingsbury

TL;DR
This paper introduces the NNSI algorithm, an automatic data augmentation method that enhances intent classification accuracy in voice systems by intelligently selecting and labeling ambiguous samples, reducing manual effort.
Contribution
The paper presents the NNSI algorithm, a novel automatic data selection and labeling method that improves intent classifier performance in spoken conversation datasets.
Findings
Reduced classifier error rates by up to 10%.
Effectively selected and labeled ambiguous samples.
Improved intent classification accuracy in real-world systems.
Abstract
Intent classifiers are vital to the successful operation of virtual agent systems. This is especially so in voice activated systems where the data can be noisy with many ambiguous directions for user intents. Before operation begins, these classifiers are generally lacking in real-world training data. Active learning is a common approach used to help label large amounts of collected user input. However, this approach requires many hours of manual labeling work. We present the Nearest Neighbors Scores Improvement (NNSI) algorithm for automatic data selection and labeling. The NNSI reduces the need for manual labeling by automatically selecting highly-ambiguous samples and labeling them with high accuracy. This is done by integrating the classifier's output from a semantically similar group of text samples. The labeled samples can then be added to the training set to improve the accuracy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Fuzzy Logic and Control Systems · Natural Language Processing Techniques
