Do MLLMs Exhibit Human-like Perceptual Behaviors? HVSBench: A Benchmark for MLLM Alignment with Human Perceptual Behavior

Jiaying Lin; Shuquan Ye; Dan Xu; Wanli Ouyang; Rynson W.H. Lau

arXiv:2412.09603·cs.CV·December 18, 2025

Do MLLMs Exhibit Human-like Perceptual Behaviors? HVSBench: A Benchmark for MLLM Alignment with Human Perceptual Behavior

Jiaying Lin, Shuquan Ye, Dan Xu, Wanli Ouyang, Rynson W.H. Lau

PDF

Open Access

TL;DR

HVSBench is a large-scale benchmark designed to evaluate whether Multimodal Large Language Models (MLLMs) exhibit human-like perceptual behaviors across various visual tasks, revealing a significant perceptual gap compared to humans.

Contribution

This paper introduces HVSBench, the first comprehensive benchmark with over 85,000 samples to assess MLLM alignment with human visual perception across multiple categories.

Findings

01

MLLMs achieve only moderate performance on HVSBench

02

Humans significantly outperform MLLMs in perceptual tasks

03

The benchmark highlights the perceptual gap and the need for more human-aligned models

Abstract

While Multimodal Large Language Models (MLLMs) excel at many vision tasks, it is unknown if they exhibit human-like perceptual behaviors. To evaluate this, we introduce HVSBench, the first large-scale benchmark with over 85,000 samples designed to test MLLM alignment with the human visual system (HVS). The benchmark covers 13 categories across 5 key fields: Prominence, Subitizing, Prioritizing, Free-Viewing, and Searching. Our comprehensive evaluation reveals a significant perceptual gap: even state-of-the-art MLLMs achieve only moderate results. In contrast, human participants demonstrate strong performance, significantly outperforming all models. This underscores the high quality of HVSBench and the need for more human-aligned AI. We believe our benchmark will be a critical tool for developing the next generation of explainable MLLMs.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling