An extremely coarse feedback signal is sufficient for learning human-aligned visual representations

Yash Mehta; Michael F. Bonner

arXiv:2605.05556·cs.CV·May 8, 2026

An extremely coarse feedback signal is sufficient for learning human-aligned visual representations

Yash Mehta, Michael F. Bonner

PDF

TL;DR

This study shows that neural networks trained with coarse, broad categories develop visual representations that align closely with human perception and brain data, challenging the need for fine-grained supervision.

Contribution

The paper systematically investigates how the granularity of supervisory signals affects brain-like visual representation learning, revealing coarse signals are surprisingly effective.

Findings

01

Networks trained on as few as 8 categories match or exceed neural alignment of fine-grained models.

02

Coarse training improves alignment with human perceptual similarity more than fine supervision.

03

Coarse feedback signals can produce human-like visual representations in neural networks.

Abstract

Artificial neural networks trained on visual tasks develop internal representations resembling those of the primate visual system, a discovery that has guided a decade of computational neuroscience. Research on building brain-aligned models has progressively embraced finer-grained supervisory signals, from object classification to contrastive self-supervised objectives that maximize distinctions among individual images, yet the role of supervisory signal granularity on brain alignment remains largely unexamined. Here we systematically investigate how the coarseness of a learning signal shapes representational alignment with human vision. We parametrically vary the level of signal granularity using a data-driven approach that partitions a set of training images into varied numbers of categories (2, 4, 8, 16, ..., 64) via PCA-based splits of pretrained embeddings. We train hundreds of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.