Nunchi-Bench: Benchmarking Language Models on Cultural Reasoning with a Focus on Korean Superstition

Kyuhee Kim; Sangah Lee

arXiv:2507.04014·cs.CL·July 8, 2025

Nunchi-Bench: Benchmarking Language Models on Cultural Reasoning with a Focus on Korean Superstition

Kyuhee Kim, Sangah Lee

PDF

1 Datasets

TL;DR

Nunchi-Bench is a new benchmark for evaluating large language models' understanding of Korean culture and superstitions, revealing significant challenges in cultural reasoning and highlighting the importance of cultural framing.

Contribution

This paper introduces Nunchi-Bench, a comprehensive benchmark for assessing LLMs' cultural reasoning, especially regarding Korean superstitions, with novel evaluation metrics and multilingual analysis.

Findings

01

Models recognize factual cultural knowledge but struggle with practical application.

02

Explicit cultural framing improves model performance more than language alone.

03

Benchmark and leaderboard are publicly available for future research.

Abstract

As large language models (LLMs) become key advisors in various domains, their cultural sensitivity and reasoning skills are crucial in multicultural environments. We introduce Nunchi-Bench, a benchmark designed to evaluate LLMs' cultural understanding, with a focus on Korean superstitions. The benchmark consists of 247 questions spanning 31 topics, assessing factual knowledge, culturally appropriate advice, and situational interpretation. We evaluate multilingual LLMs in both Korean and English to analyze their ability to reason about Korean cultural contexts and how language variations affect performance. To systematically assess cultural reasoning, we propose a novel evaluation strategy with customized scoring metrics that capture the extent to which models recognize cultural nuances and respond appropriately. Our findings highlight significant challenges in LLMs' cultural reasoning.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

koreankiwi99/Nunchi-Bench
dataset· 29 dl
29 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.