Creating a Lens of Chinese Culture: A Multimodal Dataset for Chinese Pun Rebus Art Understanding
Tuo Zhang, Tiantian Feng, Yibin Ni, Mengqin Cao, Ruying Liu, Katharine, Butler, Yanjun Weng, Mi Zhang, Shrikanth S. Narayanan, Salman Avestimehr

TL;DR
This paper introduces the Pun Rebus Art Dataset, a multimodal Chinese cultural art dataset, to evaluate and improve vision-language models' understanding of traditional Chinese rebus art and its symbolic meanings.
Contribution
The paper presents a new culturally rich dataset for Chinese rebus art understanding and highlights the limitations of current VLMs in interpreting such art forms.
Findings
State-of-the-art VLMs struggle with Chinese rebus tasks.
Existing models often produce biased and hallucinated explanations.
Limited improvement observed through in-context learning.
Abstract
Large vision-language models (VLMs) have demonstrated remarkable abilities in understanding everyday content. However, their performance in the domain of art, particularly culturally rich art forms, remains less explored. As a pearl of human wisdom and creativity, art encapsulates complex cultural narratives and symbolism. In this paper, we offer the Pun Rebus Art Dataset, a multimodal dataset for art understanding deeply rooted in traditional Chinese culture. We focus on three primary tasks: identifying salient visual elements, matching elements with their symbolic meanings, and explanations for the conveyed messages. Our evaluation reveals that state-of-the-art VLMs struggle with these tasks, often providing biased and hallucinated explanations and showing limited improvement through in-context learning. By releasing the Pun Rebus Art Dataset, we aim to facilitate the development of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsCultural Heritage Management and Preservation
MethodsFocus
