Nunchi-Bench: Benchmarking Language Models on Cultural Reasoning with a Focus on Korean Superstition
Kyuhee Kim, Sangah Lee

TL;DR
Nunchi-Bench is a new benchmark for evaluating large language models' understanding of Korean culture and superstitions, revealing significant challenges in cultural reasoning and highlighting the importance of cultural framing.
Contribution
This paper introduces Nunchi-Bench, a comprehensive benchmark for assessing LLMs' cultural reasoning, especially regarding Korean superstitions, with novel evaluation metrics and multilingual analysis.
Findings
Models recognize factual cultural knowledge but struggle with practical application.
Explicit cultural framing improves model performance more than language alone.
Benchmark and leaderboard are publicly available for future research.
Abstract
As large language models (LLMs) become key advisors in various domains, their cultural sensitivity and reasoning skills are crucial in multicultural environments. We introduce Nunchi-Bench, a benchmark designed to evaluate LLMs' cultural understanding, with a focus on Korean superstitions. The benchmark consists of 247 questions spanning 31 topics, assessing factual knowledge, culturally appropriate advice, and situational interpretation. We evaluate multilingual LLMs in both Korean and English to analyze their ability to reason about Korean cultural contexts and how language variations affect performance. To systematically assess cultural reasoning, we propose a novel evaluation strategy with customized scoring metrics that capture the extent to which models recognize cultural nuances and respond appropriately. Our findings highlight significant challenges in LLMs' cultural reasoning.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
