Transforming Neural Network Visual Representations to Predict Human Judgments of Similarity
Maria Attarian, Brett D. Roads, Michael C. Mozer

TL;DR
This paper demonstrates that applying flexible linear transformations, including asymmetric ones, to deep neural network embeddings significantly improves their ability to predict human similarity judgments, aligning machine representations more closely with human perception.
Contribution
The study introduces a novel approach of using expressive linear transformations, including asymmetry, to enhance neural network embeddings' alignment with human similarity judgments.
Findings
Linear transformations improve prediction accuracy from 72% to 89%.
Asymmetric similarity modeling better captures human judgments.
Reducing embedding dimensionality decreases explanatory power.
Abstract
Deep-learning vision models have shown intriguing similarities and differences with respect to human vision. We investigate how to bring machine visual representations into better alignment with human representations. Human representations are often inferred from behavioral evidence such as the selection of an image most similar to a query image. We find that with appropriate linear transformations of deep embeddings, we can improve prediction of human binary choice on a data set of bird images from 72% at baseline to 89%. We hypothesized that deep embeddings have redundant, high (4096) dimensional representations; however, reducing the rank of these representations results in a loss of explanatory power. We hypothesized that the dilation transformation of representations explored in past research is too restrictive, and indeed we found that model explanatory power can be significantly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Data Visualization and Analytics · Face Recognition and Perception
