# The Use of Unlabeled Data versus Labeled Data for Stopping Active   Learning for Text Classification

**Authors:** Garrett Beatty, Ethan Kochis, Michael Bloodgood

arXiv: 1901.09126 · 2019-04-24

## TL;DR

This paper compares different stopping criteria for active learning in text classification, finding that methods based on unlabeled data outperform those using labeled data in effectiveness.

## Contribution

It provides the first comprehensive comparison of stopping methods based on labeled data, unlabeled data, and training data during active learning for text classification.

## Key findings

- Unlabeled data-based stopping methods are more effective.
- Labeled data-based stopping methods are less effective.
- The study offers insights into optimal stopping strategies for active learning.

## Abstract

Annotation of training data is the major bottleneck in the creation of text classification systems. Active learning is a commonly used technique to reduce the amount of training data one needs to label. A crucial aspect of active learning is determining when to stop labeling data. Three potential sources for informing when to stop active learning are an additional labeled set of data, an unlabeled set of data, and the training data that is labeled during the process of active learning. To date, no one has compared and contrasted the advantages and disadvantages of stopping methods based on these three information sources. We find that stopping methods that use unlabeled data are more effective than methods that use labeled data.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1901.09126/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/1901.09126/full.md

## References

33 references — full list in the complete paper: https://tomesphere.com/paper/1901.09126/full.md

---
Source: https://tomesphere.com/paper/1901.09126