To Softmax, or not to Softmax: that is the question when applying Active   Learning for Transformer Models

Julius Gonsior; Christian Falkenberg; Silvio Magino; Anja Reusch; Maik; Thiele; Wolfgang Lehner

arXiv:2210.03005·cs.LG·March 13, 2025·5 cites

To Softmax, or not to Softmax: that is the question when applying Active Learning for Transformer Models

Julius Gonsior, Christian Falkenberg, Silvio Magino, Anja Reusch, Maik, Thiele, Wolfgang Lehner

PDF

Open Access 1 Repo

TL;DR

This paper investigates the effectiveness of different confidence measures in active learning for Transformer models, revealing that ignoring certain samples can improve the selection process over traditional softmax-based methods.

Contribution

It compares eight alternative confidence estimation methods for active learning in Transformer models and proposes a heuristic to ignore certain samples for better performance.

Findings

01

Most methods over-identify true uncertain samples, leading to worse performance when labeling only outliers.

02

Ignoring certain samples systematically improves active learning performance.

03

Softmax probabilities can be misleading for uncertainty estimation in active learning.

Abstract

Despite achieving state-of-the-art results in nearly all Natural Language Processing applications, fine-tuning Transformer-based language models still requires a significant amount of labeled data to work. A well known technique to reduce the amount of human effort in acquiring a labeled dataset is \textit{Active Learning} (AL): an iterative process in which only the minimal amount of samples is labeled. AL strategies require access to a quantified confidence measure of the model predictions. A common choice is the softmax activation function for the final layer. As the softmax function provides misleading probabilities, this paper compares eight alternatives on seven datasets. Our almost paradoxical finding is that most of the methods are too good at identifying the true most uncertain samples (outliers), and that labeling therefore exclusively outliers results in worse performance. As…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jgonsior/btw-softmax-clipping
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Machine Learning and Algorithms

MethodsSoftmax