Sigmoid Head for Quality Estimation under Language Ambiguity
Tu Anh Dinh, Jan Niehues

TL;DR
This paper introduces the Sigmoid Head, a module that improves language model quality estimation by addressing ambiguity issues inherent in softmax-based probability outputs, enhancing robustness and reliability.
Contribution
The paper proposes a novel Sigmoid Head module with a heuristic training process to better estimate quality under language ambiguity, outperforming traditional softmax-based methods.
Findings
Sigmoid Head provides a more accurate quality signal than softmax.
The method is computationally efficient during training and inference.
Sigmoid Head is more robust to out-of-domain data without needing human-annotated quality labels.
Abstract
Language model (LM) probability is not a reliable quality estimator, as natural language is ambiguous. When multiple output options are valid, the model's probability distribution is spread across them, which can misleadingly indicate low output quality. This issue is caused by two reasons: (1) LMs' final output activation is softmax, which does not allow multiple correct options to receive high probabilities simultaneuously and (2) LMs' training data is single, one-hot encoded references, indicating that there is only one correct option at each output step. We propose training a module for Quality Estimation on top of pre-trained LMs to address these limitations. The module, called Sigmoid Head, is an extra unembedding head with sigmoid activation to tackle the first limitation. To tackle the second limitation, during the negative sampling process to train the Sigmoid Head, we use a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
