Affordance Benchmark for MLLMs

Junying Wang; Wenzhe Li; Yalun Wu; Yingji Liang; Yijin Guo; Chunyi Li; Haodong Duan; Zicheng Zhang; Guangtao Zhai

arXiv:2506.00893·cs.CL·August 5, 2025

Affordance Benchmark for MLLMs

Junying Wang, Wenzhe Li, Yalun Wu, Yingji Liang, Yijin Guo, Chunyi Li, Haodong Duan, Zicheng Zhang, Guangtao Zhai

PDF

Open Access 1 Repo

TL;DR

This paper introduces A4Bench, a benchmark to evaluate how well Multimodal Large Language Models perceive affordances, revealing significant gaps compared to human understanding, especially in dynamic and contextual scenarios.

Contribution

The paper presents A4Bench, a comprehensive benchmark for assessing affordance perception in MLLMs, including new datasets for constitutive and transformative affordances, and evaluates 17 models against human performance.

Findings

01

Proprietary models outperform open-source models.

02

All models perform significantly below human levels.

03

Transformative affordance perception is particularly challenging.

Abstract

Affordance theory suggests that environments inherently provide action possibilities shaping perception and behavior. While Multimodal Large Language Models (MLLMs) achieve strong performance in vision-language tasks, their ability to perceive affordance, which is crucial for intuitive and safe interactions, remains underexplored. To address this, we introduce **A4Bench**, a novel benchmark designed to evaluate the affordance perception abilities of MLLMs across two dimensions: 1) Constitutive Affordance, assessing understanding of inherent object properties through 1,282 questionanswer pairs spanning nine sub-disciplines, and 2) Transformative Affordance, probing dynamic and contextual nuances (e.g., misleading, time-dependent, cultural, or individual-specific affordance) with 718 challenging question-answer pairs. We evaluate 17 MLLMs (nine proprietary and eight open-source) and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

junyingwang959/a4bench
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Explainable Artificial Intelligence (XAI)