Sigmoid Head for Quality Estimation under Language Ambiguity

Tu Anh Dinh; Jan Niehues

arXiv:2601.00680·cs.CL·March 30, 2026

Sigmoid Head for Quality Estimation under Language Ambiguity

Tu Anh Dinh, Jan Niehues

PDF

TL;DR

This paper introduces the Sigmoid Head, a module that improves language model quality estimation by addressing ambiguity issues inherent in softmax-based probability outputs, enhancing robustness and reliability.

Contribution

The paper proposes a novel Sigmoid Head module with a heuristic training process to better estimate quality under language ambiguity, outperforming traditional softmax-based methods.

Findings

01

Sigmoid Head provides a more accurate quality signal than softmax.

02

The method is computationally efficient during training and inference.

03

Sigmoid Head is more robust to out-of-domain data without needing human-annotated quality labels.

Abstract

Language model (LM) probability is not a reliable quality estimator, as natural language is ambiguous. When multiple output options are valid, the model's probability distribution is spread across them, which can misleadingly indicate low output quality. This issue is caused by two reasons: (1) LMs' final output activation is softmax, which does not allow multiple correct options to receive high probabilities simultaneuously and (2) LMs' training data is single, one-hot encoded references, indicating that there is only one correct option at each output step. We propose training a module for Quality Estimation on top of pre-trained LMs to address these limitations. The module, called Sigmoid Head, is an extra unembedding head with sigmoid activation to tackle the first limitation. To tackle the second limitation, during the negative sampling process to train the Sigmoid Head, we use a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.