Evaluating Representational Similarity Measures from the Lens of Functional Correspondence

Yiqing Bo; Ansh Soni; Sudhanshu Srivastava; Meenakshi Khosla

arXiv:2411.14633·q-bio.NC·September 16, 2025·2 cites

Evaluating Representational Similarity Measures from the Lens of Functional Correspondence

Yiqing Bo, Ansh Soni, Sudhanshu Srivastava, Meenakshi Khosla

PDF

Open Access 3 Reviews

TL;DR

This paper evaluates various representational similarity measures in neuroscience and AI, finding that geometry-focused metrics like CKA and Procrustes excel at aligning with behavioral data, guiding better metric selection.

Contribution

It systematically compares eight similarity metrics in the visual domain, highlighting which best capture behavioral relevance and model distinctions.

Findings

01

CKA and Procrustes outperform others in behavioral alignment

02

Geometry-based metrics better differentiate trained vs. untrained models

03

Linear predictivity shows moderate behavioral correlation

Abstract

Neuroscience and artificial intelligence (AI) both face the challenge of interpreting high-dimensional neural data, where the comparative analysis of such data is crucial for revealing shared mechanisms and differences between these complex systems. Despite the widespread use of representational comparisons and the abundance classes of comparison methods, a critical question remains: which metrics are most suitable for these comparisons? While some studies evaluate metrics based on their ability to differentiate models of different origins or constructions (e.g., various architectures), another approach is to assess how well they distinguish models that exhibit distinct behaviors. To investigate this, we examine the degree of alignment between various representational similarity measures and behavioral outcomes, employing group statistics and a comprehensive suite of behavioral metrics…

Peer Reviews

Decision·Submitted to ICLR 2025

Reviewer 01Rating 5Confidence 4

Strengths

The paper is mostly straightforward and clear. Although an empirical study, it tackles a useful question. Figures 4 and 5 provide a clear and simple message.

Weaknesses

The captions for each figure can be improved by including more information about what properties are being averaged over and what are being correlated (e.g. the dots in Figures 3 represent datasets). The use of the word “behavior” is awkward to this reader, particularly when (unless I’m mistaken) the only thing you are considering is classification. This also makes the results sound much more general than they likely are. It is certainly reasonable that the success of CKA and Procrustes in

Reviewer 02Rating 3Confidence 4

Strengths

Deeper understanding of the representational similarity measures is an important topic.  Comparisons to the actual classification behaviour of the models is an interesting new viewpoint. The basic results fit with earlier analyses and appear to be solid.

Weaknesses

While I generally believe the results and they are somewhat interesting, I think the analyses could be much more thorough and broad. The formula for RSA comparisons is wrong. To represent the classic formulation of RSA requires a standardisation of the X values and an important vectorisation step implied in $\tau$. And there are many more modern and preferred variations of this technique today. Similarly the linear encoding model description given here says nothing about the important steps of

Reviewer 03Rating 3Confidence 2

Strengths

There are several representational similarity metrics in the literature and there is relatively little understanding of their functional relationship. This work addresses this problem by comparing several metrics on a variety of behavioral tasks and models. The main strength of this paper is that it evaluates several similarity metrics (a total of 8 or 12 depending on if each k-NN is counted separately) and behavioral metrics (a total of 9). Thus, this works offers benchmarks for the representat

Weaknesses

In many ways, this paper seems incomplete. Much of the paper reads like a methods paper even though no new methods are introduced. Arguably, much of sections 1.1 and 1.2 could be relegated to the appendices. Apart from measuring the correlation between similarity metrics and behavioral metrics, there is little interpretation or investigation in the results section. In this way, this paper has the feel of exploratory analysis without follow up scientific hypothesis and analyses. Overall, I think

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Cognitive Science and Education Research

MethodsProcrustes