Enhancing Out-Of-Domain Utterance Detection with Data Augmentation Based   on Word Embeddings

Yueqi Feng; Jiali Lin

arXiv:1911.10439·cs.CL·March 30, 2020·1 cites

Enhancing Out-Of-Domain Utterance Detection with Data Augmentation Based on Word Embeddings

Yueqi Feng, Jiali Lin

PDF

Open Access

TL;DR

This paper explores how data augmentation using word embeddings can improve out-of-domain utterance detection in intelligent assistants, especially with limited OOD data, by increasing sample diversity and dispersion.

Contribution

It introduces a sampling-based data augmentation method that enhances OOD detection accuracy by increasing the dispersion of OOD samples.

Findings

01

Dispersed OOD samples improve detection performance

02

Augmentation benefits are more significant with small sample sizes

03

Random sampling increases coverage of unknown OOD space

Abstract

For most intelligent assistant systems, it is essential to have a mechanism that detects out-of-domain (OOD) utterances automatically to handle noisy input properly. One typical approach would be introducing a separate class that contains OOD utterance examples combined with in-domain text samples into the classifier. However, since OOD utterances are usually unseen to the training datasets, the detection performance largely depends on the quality of the attached OOD text data with restricted sizes of samples due to computing limits. In this paper, we study how augmented OOD data based on sampling impact OOD utterance detection with a small sample size. We hypothesize that OOD utterance samples chosen randomly can increase the coverage of unknown OOD utterance space and enhance detection accuracy if they are more dispersed. Experiments show that given the same dataset with the same OOD…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Speech and dialogue systems · Natural Language Processing Techniques