TL;DR
This paper introduces a new benchmark dataset, CUNIT, to evaluate how well large language models understand the shared cultural concepts across different countries, revealing current limitations in their cultural awareness.
Contribution
The study presents CUNIT, a comprehensive dataset for assessing LLMs' understanding of cultural unity, and systematically evaluates LLMs' ability to identify cross-cultural concept associations.
Findings
LLMs show limited ability to capture cross-cultural concept associations.
Cultural associations vary significantly across different concept categories.
Geo-cultural proximity has minimal impact on LLMs' performance in understanding cultural similarities.
Abstract
Much work on the cultural awareness of large language models (LLMs) focuses on the models' sensitivity to geo-cultural diversity. However, in addition to cross-cultural differences, there also exists common ground across cultures. For instance, a bridal veil in the United States plays a similar cultural-relevant role as a honggaitou in China. In this study, we introduce a benchmark dataset CUNIT for evaluating decoder-only LLMs in understanding the cultural unity of concepts. Specifically, CUNIT consists of 1,425 evaluation examples building upon 285 traditional cultural-specific concepts across 10 countries. Based on a systematic manual annotation of cultural-relevant features per concept, we calculate the cultural association between any pair of cross-cultural concepts. Built upon this dataset, we design a contrastive matching task to evaluate the LLMs' capability to identify highly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
