Efficient Semi-Supervised Learning for Natural Language Understanding by   Optimizing Diversity

Eunah Cho; He Xie; John P. Lalor; Varun Kumar; William M. Campbell

arXiv:1910.04196·cs.CL·October 11, 2019

Efficient Semi-Supervised Learning for Natural Language Understanding by Optimizing Diversity

Eunah Cho, He Xie, John P. Lalor, Varun Kumar, William M. Campbell

PDF

Open Access

TL;DR

This paper presents a semi-supervised learning approach for task-oriented dialogue systems that enhances diversity and efficiency by automatically augmenting data with unlabeled utterances, significantly reducing training data needs.

Contribution

It introduces a functionality-specific self-training method combined with diversity optimization techniques for more efficient natural language understanding.

Findings

01

Functionality-specific self-training improves system performance.

02

Diversity optimization reduces training data by up to 50%.

03

Methods maintain performance with less labeled data.

Abstract

Expanding new functionalities efficiently is an ongoing challenge for single-turn task-oriented dialogue systems. In this work, we explore functionality-specific semi-supervised learning via self-training. We consider methods that augment training data automatically from unlabeled data sets in a functionality-targeted manner. In addition, we examine multiple techniques for efficient selection of augmented utterances to reduce training time and increase diversity. First, we consider paraphrase detection methods that attempt to find utterance variants of labeled training data with good coverage. Second, we explore sub-modular optimization based on n-grams features for utterance selection. Experiments show that functionality-specific self-training is very effective for improving system performance. In addition, methods optimizing diversity can reduce training data in many cases to 50% with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems