From National Curricula to Cultural Awareness: Constructing Open-Ended Culture-Specific Question Answering Dataset
Haneul Yoo, Won Ik Cho, Geunhye Kim, Jiyoon Han

TL;DR
This paper presents CuCu, a scalable framework that converts national curricula into culture-specific question-answer datasets, exemplified by KCaQA from Korean social studies, to improve cultural awareness in language models.
Contribution
We introduce CuCu, a novel automated multi-agent framework that transforms curricula into open-ended, culture-specific QA datasets, enhancing cultural alignment in language models.
Findings
KCaQA contains 34.1k culture-specific QA pairs.
KCaQA covers diverse sociocultural topics.
Responses are grounded in local contexts.
Abstract
Large language models (LLMs) achieve strong performance on many tasks, but their progress remains uneven across languages and cultures, often reflecting values latent in English-centric training data. To enable practical cultural alignment, we propose a scalable approach that leverages national social studies curricula as a foundation for culture-aware supervision. We introduce CuCu, an automated multi-agent LLM framework that transforms national textbook curricula into open-ended, culture-specific question-answer pairs. Applying CuCu to the Korean national social studies curriculum, we construct KCaQA, comprising 34.1k open-ended QA pairs. Our quantitative and qualitative analyses suggest that KCaQA covers culture-specific topics and produces responses grounded in local sociocultural contexts.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Computational and Text Analysis Methods · Expert finding and Q&A systems
