Logit Distance Bounds Representational Similarity
Beatrix M. G. Nielsen, Emanuele Marconato, Luigi Gresele, Andrea Dittadi, Simon Buchholz

TL;DR
This paper introduces a logit-distance measure for evaluating and improving the linear similarity of internal representations in discriminative models, showing it outperforms KL divergence in distillation tasks.
Contribution
The authors propose a new logit-distance based measure that guarantees linear representational similarity, addressing limitations of KL divergence in model distillation.
Findings
Logit-distance bounds linear representational similarity.
KL divergence does not guarantee high linear similarity.
Logit-distance distillation improves concept recovery in models.
Abstract
For a broad family of discriminative models that includes autoregressive language models, identifiability results imply that if two models induce the same conditional distributions, then their internal representations agree up to an invertible linear transformation. We ask whether an analogous conclusion holds approximately when the distributions are close instead of equal. Building on the observation of Nielsen et al. (2025) that closeness in KL divergence need not imply high linear representational similarity, we study a distributional distance based on logit differences and show that closeness in this distance does yield linear similarity guarantees. Specifically, we define a representational dissimilarity measure based on the models' identifiability class and prove that it is bounded by the logit distance. We further show that, when model probabilities are bounded away from zero, KL…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis
