Active learning for reducing labeling effort in text classification   tasks

Pieter Floris Jacobs; Gideon Maillette de Buy Wenniger; Marco Wiering,; Lambert Schomaker

arXiv:2109.04847·cs.CL·November 5, 2021

Active learning for reducing labeling effort in text classification tasks

Pieter Floris Jacobs, Gideon Maillette de Buy Wenniger, Marco Wiering,, Lambert Schomaker

PDF

1 Repo

TL;DR

This paper empirically evaluates active learning strategies for text classification using BERT, demonstrating that uncertainty-based AL outperforms random sampling, with performance influenced by query-pool size.

Contribution

It provides the first comprehensive empirical comparison of uncertainty-based active learning algorithms with BERT in text classification tasks.

Findings

01

Uncertainty-based AL outperforms random sampling with BERT.

02

Heuristics did not improve AL performance.

03

Performance gap decreases as query-pool size increases.

Abstract

Labeling data can be an expensive task as it is usually performed manually by domain experts. This is cumbersome for deep learning, as it is dependent on large labeled datasets. Active learning (AL) is a paradigm that aims to reduce labeling effort by only using the data which the used model deems most informative. Little research has been done on AL in a text classification setting and next to none has involved the more recent, state-of-the-art Natural Language Processing (NLP) models. Here, we present an empirical study that compares different uncertainty-based algorithms with BERT $_{ba se}$ as the used classifier. We evaluate the algorithms on two NLP classification datasets: Stanford Sentiment Treebank and KvK-Frontpages. Additionally, we explore heuristics that aim to solve presupposed problems of uncertainty-based AL; namely, that it is unscalable and that it is prone to selecting…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pieter-jacobs/bachelor-thesis
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.