Active Learning for New Domains in Natural Language Understanding
Stanislav Peshterliev, John Kearney, Abhyuday Jagannatha, Imre Kiss,, Spyros Matsoukas

TL;DR
This paper introduces Majority-CRF, an active learning algorithm that enhances NLU system accuracy for new domains by intelligently selecting utterances, resulting in significant error reduction and improved system performance.
Contribution
The paper presents a novel ensemble-based active learning method, Majority-CRF, tailored for domain adaptation in NLU systems, outperforming existing approaches.
Findings
Achieves 6.6%-9% relative error reduction over random sampling.
Statistically significant improvements over other active learning methods.
Case studies show 4.6%-9% improvement with human-in-the-loop AL.
Abstract
We explore active learning (AL) for improving the accuracy of new domains in a natural language understanding (NLU) system. We propose an algorithm called Majority-CRF that uses an ensemble of classification models to guide the selection of relevant utterances, as well as a sequence labeling model to help prioritize informative examples. Experiments with three domains show that Majority-CRF achieves 6.6%-9% relative error rate reduction compared to random sampling with the same annotation budget, and statistically significant improvements compared to other AL approaches. Additionally, case studies with human-in-the-loop AL on six new domains show 4.6%-9% improvement on an existing NLU system.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Natural Language Processing Techniques · Topic Modeling
