Small-Text: Active Learning for Text Classification in Python
Christopher Schr\"oder, Lydia M\"uller, Andreas Niekler, Martin, Potthast

TL;DR
small-text is a versatile Python library that simplifies active learning for text classification, supporting various classifiers, strategies, and GPU acceleration, and enabling efficient experimentation and application development.
Contribution
It introduces small-text, a comprehensive active learning library integrating multiple ML frameworks and strategies, facilitating rapid development and evaluation.
Findings
SetFit matches transformer accuracy in classification
SetFit outperforms in AUC metrics
Library supports GPU-accelerated query strategies
Abstract
We introduce small-text, an easy-to-use active learning library, which offers pool-based active learning for single- and multi-label text classification in Python. It features numerous pre-implemented state-of-the-art query strategies, including some that leverage the GPU. Standardized interfaces allow the combination of a variety of classifiers, query strategies, and stopping criteria, facilitating a quick mix and match, and enabling a rapid and convenient development of both active learning experiments and applications. With the objective of making various classifiers and query strategies accessible for active learning, small-text integrates several well-known machine learning libraries, namely scikit-learn, PyTorch, and Hugging Face transformers. The latter integrations are optionally installable extensions, so GPUs can be used but are not required. Using this new library, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Machine Learning and Data Classification · Computational Physics and Python Applications
