DALL: Data Labeling via Data Programming and Active Learning Enhanced by Large Language Models
Guozheng Li, Ao Wang, Shaoxiang Wang, Yu Zhang, Pengcheng Cao, Yang Bai, Chi Harold Liu

TL;DR
DALL is an interactive text labeling framework that combines data programming, active learning, and large language models to improve label quality and reduce labeling costs in NLP tasks.
Contribution
It introduces a structured specification for defining labeling functions and integrates large language models to assist in label correction and function refinement.
Findings
DALL improves labeling efficiency in NLP tasks.
Modules of DALL significantly enhance label quality.
Usability studies confirm DALL's practical effectiveness.
Abstract
Deep learning models for natural language processing rely heavily on high-quality labeled datasets. However, existing labeling approaches often struggle to balance label quality with labeling cost. To address this challenge, we propose DALL, a text labeling framework that integrates data programming, active learning, and large language models. DALL introduces a structured specification that allows users and large language models to define labeling functions via configuration, rather than code. Active learning identifies informative instances for review, and the large language model analyzes these instances to help users correct labels and to refine or suggest labeling functions. We implement DALL as an interactive labeling system for text labeling tasks. Comparative, ablation, and usability studies demonstrate DALL's efficiency, the effectiveness of its modules, and its usability.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Natural Language Processing Techniques · Topic Modeling
