Transforming Neural Network Visual Representations to Predict Human   Judgments of Similarity

Maria Attarian; Brett D. Roads; Michael C. Mozer

arXiv:2010.06512·cs.NE·January 13, 2021·6 cites

Transforming Neural Network Visual Representations to Predict Human Judgments of Similarity

Maria Attarian, Brett D. Roads, Michael C. Mozer

PDF

Open Access

TL;DR

This paper demonstrates that applying flexible linear transformations, including asymmetric ones, to deep neural network embeddings significantly improves their ability to predict human similarity judgments, aligning machine representations more closely with human perception.

Contribution

The study introduces a novel approach of using expressive linear transformations, including asymmetry, to enhance neural network embeddings' alignment with human similarity judgments.

Findings

01

Linear transformations improve prediction accuracy from 72% to 89%.

02

Asymmetric similarity modeling better captures human judgments.

03

Reducing embedding dimensionality decreases explanatory power.

Abstract

Deep-learning vision models have shown intriguing similarities and differences with respect to human vision. We investigate how to bring machine visual representations into better alignment with human representations. Human representations are often inferred from behavioral evidence such as the selection of an image most similar to a query image. We find that with appropriate linear transformations of deep embeddings, we can improve prediction of human binary choice on a data set of bird images from 72% at baseline to 89%. We hypothesized that deep embeddings have redundant, high (4096) dimensional representations; however, reducing the rank of these representations results in a loss of explanatory power. We hypothesized that the dilation transformation of representations explored in past research is too restrictive, and indeed we found that model explanatory power can be significantly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Data Visualization and Analytics · Face Recognition and Perception