Why not to use Cosine Similarity between Label Representations

Beatrix M. G. Nielsen

arXiv:2603.29488·cs.LG·April 1, 2026

Why not to use Cosine Similarity between Label Representations

Beatrix M. G. Nielsen

PDF

TL;DR

This paper demonstrates that cosine similarity between label representations in softmax classifiers does not reliably reflect model behavior or probabilities, challenging common interpretability practices.

Contribution

It proves that cosine similarity of label representations can be arbitrarily manipulated without changing model outputs, highlighting its unreliability for interpretation.

Findings

01

Cosine similarity does not correlate with model probabilities.

02

Equivalent models can have label representations with cosine similarity of 1 or -1.

03

Centering representations does not resolve the interpretability issue.

Abstract

Cosine similarity is often used to measure the similarity of vectors. These vectors might be the representations of neural network models. However, it is not guaranteed that cosine similarity of model representations will tell us anything about model behaviour. In this paper we show that when using a softmax classifier, be it an image classifier or an autoregressive language model, measuring the cosine similarity between label representations (called unembeddings in the paper) does not give any information on the probabilities assigned by the model. Specifically, we prove that for any softmax classifier model, given two label representations, it is possible to make another model which gives the same probabilities for all labels and inputs, but where the cosine similarity between the representations is now either 1 or -1. We give specific examples of models with very high or low cosine…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.