Hanfu-Bench: A Multimodal Benchmark on Cross-Temporal Cultural Understanding and Transcreation

Li Zhou; Lutong Yu; Dongchu Xie; Shaohuan Cheng; Wenyan Li; Haizhou Li

arXiv:2506.01565·cs.CL·September 30, 2025

Hanfu-Bench: A Multimodal Benchmark on Cross-Temporal Cultural Understanding and Transcreation

Li Zhou, Lutong Yu, Dongchu Xie, Shaohuan Cheng, Wenyan Li, Haizhou Li

PDF

Open Access 1 Datasets 1 Video

TL;DR

Hanfu-Bench is a new multimodal dataset that advances cross-temporal cultural understanding and transcreation of traditional Chinese attire, highlighting challenges for vision-language models in capturing cultural and temporal nuances.

Contribution

Introduces Hanfu-Bench, a multimodal dataset with tasks on cultural visual understanding and transcreation, emphasizing temporal aspects of Chinese culture.

Findings

01

Closed VLMs match non-experts in understanding but lag behind humans by 10%.

02

Open VLMs perform worse than non-experts.

03

Best transcreation model achieves only 42% success rate.

Abstract

Culture is a rich and dynamic domain that evolves across both geography and time. However, existing studies on cultural understanding with vision-language models (VLMs) primarily emphasize geographic diversity, often overlooking the critical temporal dimensions. To bridge this gap, we introduce Hanfu-Bench, a novel, expert-curated multimodal dataset. Hanfu, a traditional garment spanning ancient Chinese dynasties, serves as a representative cultural heritage that reflects the profound temporal aspects of Chinese culture while remaining highly popular in Chinese contemporary society. Hanfu-Bench comprises two core tasks: cultural visual understanding and cultural image transcreation. The former task examines temporal-cultural feature recognition based on single- or multi-image inputs through multiple-choice visual question answering, while the latter focuses on transforming traditional…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

lizhou21/hanfu-bench
dataset· 14 dl
14 dl

Videos

Hanfu-Bench: A Multimodal Benchmark on Cross-Temporal Cultural Understanding and Transcreation· underline

Taxonomy

TopicsLanguage, Metaphor, and Cognition