Mean-Pooled Cosine Similarity is Not Length-Invariant: Theory and Cross-Domain Evidence for a Length-Invariant Alternative
Sibayan Mitra (1), Dhruv Kumar (1) ((1) BITS Pilani)

TL;DR
This paper demonstrates that mean-pooled cosine similarity is not length-invariant in transformer representations, leading to biased similarity measures, and advocates for length-invariant metrics like CKA for cross-representation analysis.
Contribution
The paper provides theoretical and empirical evidence that mean-pooled cosine similarity is length-dependent and proposes CKA as a more reliable alternative for comparing neural representations.
Findings
Mean-pooled cosine similarity increases monotonically with sequence length.
Replacing cosine with CKA significantly reduces length-related variance.
Length effects are consistent across multiple models and languages.
Abstract
Mean-pooled cosine similarity is the default metric for comparing neural representations across languages, modalities, and tasks. We establish that this metric is not length-invariant: under the anisotropy that characterizes modern transformer representations, mean-pooled cosine grows monotonically in sequence length, independent of representational content. Empirically, on HumanEvalPack across four code LLMs, the length ratio alone explains -- of cross-language "Python proximity," while AST depth and shared-token fraction add less than 3% of explained variance beyond length. Substituting Centered Kernel Alignment (CKA) reduces explained variance by 83% and reverses the sign of the length coefficient (). The same pattern holds in Mistral-7B on parallel WMT pairs ( EN-FR, EN-DE for cosine; for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
