Assisted Text Annotation Using Active Learning to Achieve High Quality with Little Effort
Franziska Weeber, Felix Hamborg, Karsten Donnay, Bela Gipp

TL;DR
This paper introduces a semi-automatic annotation tool that combines active learning with pre-trained language models to significantly reduce manual annotation effort while maintaining high quality, demonstrated on news article framing tasks.
Contribution
The paper presents a novel active learning approach integrated with pre-trained language models for efficient text annotation, reducing manual effort needed for high-quality datasets.
Findings
Active learning reduces annotation effort to 16.3% of full dataset.
The approach achieves comparable performance with significantly fewer annotations.
Effective on complex framing classification in news articles.
Abstract
Large amounts of annotated data have become more important than ever, especially since the rise of deep learning techniques. However, manual annotations are costly. We propose a tool that enables researchers to create large, high-quality, annotated datasets with only a few manual annotations, thus strongly reducing annotation cost and effort. For this purpose, we combine an active learning (AL) approach with a pre-trained language model to semi-automatically identify annotation categories in the given text documents. To highlight our research direction's potential, we evaluate the approach on the task of identifying frames in news articles. Our preliminary results show that employing AL strongly reduces the number of annotations for correct classification of even these complex and subtle frames. On the framing dataset, the AL approach needs only 16.3\% of the annotations to reach the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Topic Modeling · Natural Language Processing Techniques
