Similarity-Distance-Magnitude Language Models

Allen Schmaltz

arXiv:2510.26183·cs.CL·October 31, 2025

Similarity-Distance-Magnitude Language Models

Allen Schmaltz

PDF

TL;DR

This paper introduces SDM language models that improve sequence prediction by fine-tuning pre-trained Transformers with a special activation layer, leading to better statistical efficiency and fewer abstentions.

Contribution

The authors propose a novel SDM activation layer and a fine-tuning method that enhances existing language models' prediction accuracy and efficiency.

Findings

01

Reduced abstentions compared to baselines

02

Improved statistical efficiency in sequence prediction

03

Effective conversion of pre-trained models into SDM models

Abstract

We introduce Similarity-Distance-Magnitude (SDM) language models (LMs), which are sequence prediction models fine-tuned to maximize the proportion of generations in the well-calibrated, high-probability region partitioned by a final-layer SDM activation layer used for binary classification of instruction-following. We demonstrate that existing pre-trained decoder-only Transformer LMs can be readily converted into SDM LMs via supervised fine-tuning, using the final-layer SDM activation layer during training to estimate a change-of-base for a supervised next-token loss over a contrastive input encoding scheme, with additional hard negative examples generated online during training. This results in reduced abstentions (i.e., improved statistical efficiency) compared to strong supervised baselines.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.