Active Learning Over Multiple Domains in Natural Language Tasks
Shayne Longpre, Julia Reisler, Edward Greg Huang, Yi Lu, Andrew Frank,, Nikhil Ramesh, Chris DuBois

TL;DR
This paper investigates active learning across multiple domains in natural language tasks, proposing a new method and analyzing various techniques to improve data selection in multi-source, out-of-distribution scenarios.
Contribution
It introduces DAL-E, a novel variant of H-Divergence methods, and provides a comprehensive analysis of active learning strategies for multi-domain NLP tasks.
Findings
DAL-E achieves 2-3% improvements over random baseline.
Diverse domain allocation enhances active learning effectiveness.
Existing methods have room for improvement in multi-domain settings.
Abstract
Studies of active learning traditionally assume the target and source data stem from a single domain. However, in realistic applications, practitioners often require active learning with multiple sources of out-of-distribution data, where it is unclear a priori which data sources will help or hurt the target domain. We survey a wide variety of techniques in active learning (AL), domain shift detection (DS), and multi-domain sampling to examine this challenging setting for question answering and sentiment analysis. We ask (1) what family of methods are effective for this task? And, (2) what properties of selected examples and domains achieve strong results? Among 18 acquisition functions from 4 families of methods, we find H-Divergence methods, and particularly our proposed variant DAL-E, yield effective results, averaging 2-3% improvements over the random baseline. We also show the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Speech Recognition and Synthesis · Natural Language Processing Techniques
