Discriminating image representations with principal distortions

Jenelle Feather; David Lipshutz; Sarah E. Harvey; Alex H. Williams; Eero P. Simoncelli

arXiv:2410.15433·q-bio.NC·May 19, 2025

Discriminating image representations with principal distortions

Jenelle Feather, David Lipshutz, Sarah E. Harvey, Alex H. Williams, Eero P. Simoncelli

PDF

Open Access 3 Reviews

TL;DR

This paper introduces a framework using Fisher information to compare image representations based on their local geometries, revealing differences in models and potential links to human perception.

Contribution

The paper presents a novel method for comparing image representations through local geometry analysis using Fisher information, enabling differentiation of models based on local sensitivities.

Findings

01

Identified principal distortions that differentiate models

02

Compared early visual system models using local geometry

03

Revealed architecture and training differences in neural networks

Abstract

Image representations (artificial or biological) are often compared in terms of their global geometric structure; however, representations with similar global structure can have strikingly different local geometries. Here, we propose a framework for comparing a set of image representations in terms of their local geometries. We quantify the local geometry of a representation using the Fisher information matrix, a standard statistical tool for characterizing the sensitivity to local stimulus distortions, and use this as a substrate for a metric on the local geometry in the vicinity of a base image. This metric may then be used to optimally differentiate a set of models, by finding a pair of "principal distortions" that maximize the variance of the models under this metric. As an example, we use this framework to compare a set of simple models of the early visual system, identifying a…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 6Confidence 3

Strengths

- This is well-written and organized paper; hence it is easy for readers to follow. - The concept of exploring the local geometry of deep networks and examining the interplay between local geometry and the global structure of images is compelling. - Developing a novel metric to compare two image representations is highly innovative.

Weaknesses

- While the proposed method of using the Fisher Information Matrix to measure the sensitivity of a representation to a stimulus distortion seems reasonable, its effectiveness as a metric is unclear, or difficult to justify. - Identifying the types of principal distortions to which the network is most sensitive is an interesting idea. However, the proposed method lacks a good validation plan to confirm the accuracy or reliability of these findings.

Reviewer 02Rating 5Confidence 4

Strengths

The framework's focus on comparing local geometry is innovative, and the use of the Fisher information matrix and sensitivity to local distortions represents a unique approach to quantifying image representation. This novel method to derive principal distortion pairs that maximize model variance offers a potentially valuable tool for model discrimination.

Weaknesses

Model Selection: The authors demonstrate the metric’s functionality using older architectures, specifically AlexNet and ResNet. Given that AlexNet is largely obsolete in current practical applications, the paper would benefit from extending the evaluation to more contemporary and widely-used networks (e.g., EfficientNet, Vision Transformers). Demonstrating the framework's effectiveness across a variety of modern architectures would strengthen the claim that the metric is universally applicable a

Reviewer 03Rating 6Confidence 3

Strengths

1. The idea is simple but effective -- as the first approach to compare more than two models. 2. The problem statement and method description are well-written and clear to understand. 3. The experiment result is interesting, with meaningful discussions about texture bias and adversarial vulnerability. Visualizations are very helpful. 4. The supplementary is very informative. .

Weaknesses

1. It is not clear how this paper's principal distortions relate to human sensitivity. There seem to be no experiments to prove this statement, such as using human observers for evaluation of the principle distortions [1]. 2. For example, in 4.1 Early Vision Models, it is not stated how to determine the effectiveness of this approach. There is no quantitative evaluation, and it is unclear how to understand the visualizations of the principle distortion images. 3. Are there some practical applic

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques

MethodsSparse Evolutionary Training · Balanced Selection