Disentangling the Factors of Convergence between Brains and Computer Vision Models

Jos\'ephine Raugel; Marc Szafraniec; Huy V. Vo; Camille Couprie; Patrick Labatut; Piotr Bojanowski; Valentin Wyart; Jean-R\'emi King

arXiv:2508.18226·cs.AI·August 26, 2025

Disentangling the Factors of Convergence between Brains and Computer Vision Models

Jos\'ephine Raugel, Marc Szafraniec, Huy V. Vo, Camille Couprie, Patrick Labatut, Piotr Bojanowski, Valentin Wyart, Jean-R\'emi King

PDF

TL;DR

This study systematically investigates how model size, training data, and architecture influence the development of brain-like visual representations in self-supervised vision transformers, revealing a developmental trajectory aligned with human cortical features.

Contribution

It provides a comprehensive analysis of factors affecting brain-model similarity, highlighting the emergence of human-like representations during training and their relation to cortical development.

Findings

01

Larger models trained on human-centric images show higher brain similarity.

02

Brain-like representations emerge progressively, first aligning with sensory cortices, then with higher-order areas.

03

Representation development correlates with cortical features like expansion, thickness, and timescales.

Abstract

Many AI models trained on natural images develop representations that resemble those of the human brain. However, the factors that drive this brain-model similarity remain poorly understood. To disentangle how the model, training and data independently lead a neural network to develop brain-like representations, we trained a family of self-supervised vision transformers (DINOv3) that systematically varied these different factors. We compare their representations of images to those of the human brain recorded with both fMRI and MEG, providing high resolution in spatial and temporal analyses. We assess the brain-model similarity with three complementary metrics focusing on overall representational similarity, topographical organization, and temporal dynamics. We show that all three factors - model size, training amount, and image type - independently and interactively impact each of these…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.