Bridging Perception and Language: A Systematic Benchmark for LVLMs' Understanding of Amodal Completion Reports
Amane Watahiki, Tomoki Doi, Taiga Shinozaki, Satoshi Nishida, Takuya Niikawa, Katsunori Miyahara, Hitomi Yanaka

TL;DR
This paper introduces a benchmark for evaluating large vision-language models' ability to understand amodal completion, revealing strengths and weaknesses across object categories and language prompts, especially in Japanese.
Contribution
It systematically classifies amodal completion in LVLMs using a formal ontology, providing insights into their inferential capabilities and linguistic limitations.
Findings
LVLMs perform comparably to humans overall
Accuracy varies by object category and stimulus type
Japanese prompts reveal linguistic deficiencies in models
Abstract
One of the main objectives in developing large vision-language models (LVLMs) is to engineer systems that can assist humans with multimodal tasks, including interpreting descriptions of perceptual experiences. A central phenomenon in this context is amodal completion, in which people perceive objects even when parts of those objects are hidden. Although numerous studies have assessed whether computer-vision algorithms can detect or reconstruct occluded regions, the inferential abilities of LVLMs on texts related to amodal completion remain unexplored. To address this gap, we constructed a benchmark grounded in Basic Formal Ontology to achieve a systematic classification of amodal completion. Our results indicate that while many LVLMs achieve human-comparable performance overall, their accuracy diverges for certain types of objects being completed. Notably, in certain categories, some…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Action Observation and Synchronization · Neurobiology of Language and Bilingualism
