Do Models See in Line with Human Vision? Probing the Correspondence Between LVLM Representations and EEG Signals

Xin Xiao; Yang Lei; Haoyang Zeng; Xiao Sun; Xinyi Jiang; Yu Tian; Hao Wu; Kaiwen Wei; Jiang Zhong

arXiv:2603.08303·cs.HC·March 10, 2026

Do Models See in Line with Human Vision? Probing the Correspondence Between LVLM Representations and EEG Signals

Xin Xiao, Yang Lei, Haoyang Zeng, Xiao Sun, Xinyi Jiang, Yu Tian, Hao Wu, Kaiwen Wei, Jiang Zhong

PDF

Open Access

TL;DR

This study investigates how well large vision language models' internal representations align with human EEG signals, revealing that certain layers and architectures show significant brain-like visual processing patterns, offering a new neuro-inspired evaluation benchmark.

Contribution

The paper introduces a novel method to quantify LVLM-brain alignment using EEG signals, highlighting the influence of architecture over scale and linking model performance to neural similarity.

Findings

01

Intermediate layers align with EEG in 100-300 ms window

02

Architectural design impacts brain alignment more than scale

03

Models with better visual performance show higher EEG similarity

Abstract

Large Vision Language Models (LVLMs) exhibit strong visual understanding and reasoning abilities. However, whether their internal representations reflect human visual cognition is still under-explored. In this paper, we address this by quantifying LVLM-brain alignment using image-evoked Electroencephalogram (EEG) signals, analyzing the effects of model architecture, scale, and image type. Specifically, by using ridge regression and representational similarity analysis, we compare visual representations from 32 open-source LVLMs with corresponding EEG responses. We observe a structured LVLM-brain correspondence: First, intermediate layers (8-16) show peak alignment with EEG activity in the 100-300 ms window, consistent with hierarchical human visual processing. Secondly, multimodal architectural design contributes 3.4 more to brain alignment than parameter scaling, and models with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace Recognition and Perception · Multimodal Machine Learning Applications · Neurobiology of Language and Bilingualism