Leveraging Variation Theory in Counterfactual Data Augmentation for Optimized Active Learning
Simret Araya Gebreegziabher, Kuangshi Ai, Zheng Zhang, Elena L. Glassman, Toby Jia-Jun Li

TL;DR
This paper presents a novel counterfactual data augmentation method inspired by Variation Theory to improve active learning efficiency, especially in low-data scenarios, by synthesizing artificial data points that highlight key features.
Contribution
It introduces a neuro-symbolic pipeline combining LLMs and rule-based models to generate synthetic data for active learning, addressing the cold start problem.
Findings
Significantly improves performance with fewer labeled data
Reduces the impact of data augmentation as data size increases
Addresses cold start problem in active learning
Abstract
Active Learning (AL) allows models to learn interactively from user feedback. This paper introduces a counterfactual data augmentation approach to AL, particularly addressing the selection of datapoints for user querying, a pivotal concern in enhancing data efficiency. Our approach is inspired by Variation Theory, a theory of human concept learning that emphasizes the essential features of a concept by focusing on what stays the same and what changes. Instead of just querying with existing datapoints, our approach synthesizes artificial datapoints that highlight potential key similarities and differences among labels using a neuro-symbolic pipeline combining large language models (LLMs) and rule-based models. Through an experiment in the example domain of text classification, we show that our approach achieves significantly higher performance when there are fewer annotated data. As the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Neural Networks and Applications · Machine Learning and Algorithms
