Why not to use Cosine Similarity between Label Representations
Beatrix M. G. Nielsen

TL;DR
This paper demonstrates that cosine similarity between label representations in softmax classifiers does not reliably reflect model behavior or probabilities, challenging common interpretability practices.
Contribution
It proves that cosine similarity of label representations can be arbitrarily manipulated without changing model outputs, highlighting its unreliability for interpretation.
Findings
Cosine similarity does not correlate with model probabilities.
Equivalent models can have label representations with cosine similarity of 1 or -1.
Centering representations does not resolve the interpretability issue.
Abstract
Cosine similarity is often used to measure the similarity of vectors. These vectors might be the representations of neural network models. However, it is not guaranteed that cosine similarity of model representations will tell us anything about model behaviour. In this paper we show that when using a softmax classifier, be it an image classifier or an autoregressive language model, measuring the cosine similarity between label representations (called unembeddings in the paper) does not give any information on the probabilities assigned by the model. Specifically, we prove that for any softmax classifier model, given two label representations, it is possible to make another model which gives the same probabilities for all labels and inputs, but where the cosine similarity between the representations is now either 1 or -1. We give specific examples of models with very high or low cosine…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
