A Computational Approach to Visual Metonymy
Saptarshi Ghosh, Linfeng Liu, Tianyu Jiang

TL;DR
This paper introduces the first computational study of visual metonymy, using a novel framework with large language and text-to-image models to evaluate how well models interpret indirect visual references, revealing significant gaps compared to humans.
Contribution
It presents a new pipeline based on semiotic theory for generating and analyzing visual metonymy, and introduces ViMET, the first dataset for evaluating multimodal reasoning in this context.
Findings
Humans achieve 86.9% accuracy on visual metonymy tasks.
State-of-the-art models reach only 65.9% accuracy.
The results highlight current limitations in machine understanding of indirect visual cues.
Abstract
Images often communicate more than they literally depict: a set of tools can suggest an occupation and a cultural artifact can suggest a tradition. This kind of indirect visual reference, known as visual metonymy, invites viewers to recover a target concept via associated cues rather than explicit depiction. In this work, we present the first computational investigation of visual metonymy. We introduce a novel pipeline grounded in semiotic theory that leverages large language models and text-to-image models to generate metonymic visual representations. Using this framework, we construct ViMET, the first visual metonymy dataset comprising 2,000 multiple-choice questions to evaluate the cognitive reasoning abilities in multimodal language models. Experimental results on our dataset reveal a significant gap between human performance (86.9%) and state-of-the-art vision-language models…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Language, Metaphor, and Cognition · Data Visualization and Analytics
