Mean-Pooled Cosine Similarity is Not Length-Invariant: Theory and Cross-Domain Evidence for a Length-Invariant Alternative

Sibayan Mitra (1); Dhruv Kumar (1) ((1) BITS Pilani)

arXiv:2605.07345·cs.CL·May 11, 2026

Mean-Pooled Cosine Similarity is Not Length-Invariant: Theory and Cross-Domain Evidence for a Length-Invariant Alternative

Sibayan Mitra (1), Dhruv Kumar (1) ((1) BITS Pilani)

PDF

TL;DR

This paper demonstrates that mean-pooled cosine similarity is not length-invariant in transformer representations, leading to biased similarity measures, and advocates for length-invariant metrics like CKA for cross-representation analysis.

Contribution

The paper provides theoretical and empirical evidence that mean-pooled cosine similarity is length-dependent and proposes CKA as a more reliable alternative for comparing neural representations.

Findings

01

Mean-pooled cosine similarity increases monotonically with sequence length.

02

Replacing cosine with CKA significantly reduces length-related variance.

03

Length effects are consistent across multiple models and languages.

Abstract

Mean-pooled cosine similarity is the default metric for comparing neural representations across languages, modalities, and tasks. We establish that this metric is not length-invariant: under the anisotropy that characterizes modern transformer representations, mean-pooled cosine grows monotonically in sequence length, independent of representational content. Empirically, on HumanEvalPack across four code LLMs, the length ratio alone explains $R^{2} = 0.52$ -- $0.75$ of cross-language "Python proximity," while AST depth and shared-token fraction add less than 3% of explained variance beyond length. Substituting Centered Kernel Alignment (CKA) reduces explained variance by 83% and reverses the sign of the length coefficient ( $β_{len} : + 0.86 \to - 0.37$ ). The same pattern holds in Mistral-7B on parallel WMT pairs ( $R^{2} = 0.23$ EN-FR, $R^{2} = 0.33$ EN-DE for cosine; $R^{2} < 0.01$ for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.