Saliency Suppressed, Semantics Surfaced: Visual Transformations in   Neural Networks and the Brain

Gustaw Opie{\l}ka; Jessica Loke; Steven Scholte

arXiv:2404.18772·cs.CV·April 30, 2024·1 cites

Saliency Suppressed, Semantics Surfaced: Visual Transformations in Neural Networks and the Brain

Gustaw Opie{\l}ka, Jessica Loke, Steven Scholte

PDF

Open Access 1 Repo

TL;DR

This paper investigates how neural networks encode visual saliency and semantics, revealing differences in sensitivity and suppression strategies, and highlights the role of natural language supervision in aligning AI with human perception.

Contribution

It introduces a new dataset and employs representational analysis to compare saliency and semantic encoding in neural networks and the brain, revealing the effects of supervision methods.

Findings

01

ResNets are more sensitive to saliency than ViTs.

02

Networks suppress saliency early in processing, especially with CLIP supervision.

03

Semantic encoding correlates with better alignment to human perception.

Abstract

Deep learning algorithms lack human-interpretable accounts of how they transform raw visual input into a robust semantic understanding, which impedes comparisons between different architectures, training objectives, and the human brain. In this work, we take inspiration from neuroscience and employ representational approaches to shed light on how neural networks encode information at low (visual saliency) and high (semantic similarity) levels of abstraction. Moreover, we introduce a custom image dataset where we systematically manipulate salient and semantic information. We find that ResNets are more sensitive to saliency information than ViTs, when trained with object classification objectives. We uncover that networks suppress saliency in early layers, a process enhanced by natural language supervision (CLIP) in ResNets. CLIP also enhances semantic encoding in both architectures.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gucioopielka/saliency-semantic-rsa
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAesthetic Perception and Analysis

MethodsContrastive Language-Image Pre-training