Bridging Perception and Language: A Systematic Benchmark for LVLMs' Understanding of Amodal Completion Reports

Amane Watahiki; Tomoki Doi; Taiga Shinozaki; Satoshi Nishida; Takuya Niikawa; Katsunori Miyahara; Hitomi Yanaka

arXiv:2507.05799·cs.CL·July 9, 2025

Bridging Perception and Language: A Systematic Benchmark for LVLMs' Understanding of Amodal Completion Reports

Amane Watahiki, Tomoki Doi, Taiga Shinozaki, Satoshi Nishida, Takuya Niikawa, Katsunori Miyahara, Hitomi Yanaka

PDF

Open Access

TL;DR

This paper introduces a benchmark for evaluating large vision-language models' ability to understand amodal completion, revealing strengths and weaknesses across object categories and language prompts, especially in Japanese.

Contribution

It systematically classifies amodal completion in LVLMs using a formal ontology, providing insights into their inferential capabilities and linguistic limitations.

Findings

01

LVLMs perform comparably to humans overall

02

Accuracy varies by object category and stimulus type

03

Japanese prompts reveal linguistic deficiencies in models

Abstract

One of the main objectives in developing large vision-language models (LVLMs) is to engineer systems that can assist humans with multimodal tasks, including interpreting descriptions of perceptual experiences. A central phenomenon in this context is amodal completion, in which people perceive objects even when parts of those objects are hidden. Although numerous studies have assessed whether computer-vision algorithms can detect or reconstruct occluded regions, the inferential abilities of LVLMs on texts related to amodal completion remain unexplored. To address this gap, we constructed a benchmark grounded in Basic Formal Ontology to achieve a systematic classification of amodal completion. Our results indicate that while many LVLMs achieve human-comparable performance overall, their accuracy diverges for certain types of objects being completed. Notably, in certain categories, some…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Action Observation and Synchronization · Neurobiology of Language and Bilingualism