Perception or Prejudice: Can MLLMs Go Beyond First Impressions of Personality?
Caixin Kang, Tianyu Yan, Sitong Gong, Mingfang Zhang, Liangyang Ouyang, Ruicong Liu, Bo Zheng, Huchuan Lu, Kaipeng Zhang, Yoichi Sato, Yifei Huang

TL;DR
This paper introduces Grounded Personality Reasoning (GPR), a new dataset and benchmark to evaluate whether Multimodal Large Language Models genuinely understand personality traits through behavioral evidence or rely on superficial cues.
Contribution
It formalizes GPR as a new task, releases MM-OCEAN dataset with behavioral observations, and benchmarks 27 MLLMs to analyze their reasoning and grounding capabilities.
Findings
51% of correct ratings are not grounded in cues
Holistic-Grounding Rate ranges only from 0 to 33.5%
Significant Prejudice Gap in model reasoning
Abstract
Multimodal Large Language Models (MLLMs) are increasingly deployed in human-facing roles where personality perception is critical, yet existing benchmarks evaluate this capability solely on numerical Big Five score prediction, leaving open whether models truly perceive personality through behavioral understanding or merely prejudge through superficial pattern matching. We address this gap with three contributions. (i) A new task: we formalize Grounded Personality Reasoning (GPR), which requires MLLMs to anchor each Big Five rating in observable evidence through a chain of rating, reasoning, and grounding. (ii) A new dataset: we release MM-OCEAN (1,104 videos, 5,320 MCQs), produced by a multi-agent pipeline with human verification, with timestamped behavioral observations, evidence-grounded trait analyses, and seven categories of cue-grounding MCQs. (iii) Benchmark and analysis: we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
