CaLMQA: Exploring culturally specific long-form question answering across 23 languages
Shane Arora, Marzena Karpinska, Hung-Ting Chen, Ipsita Bhattacharjee, Mohit Iyyer, Eunsol Choi

TL;DR
This paper introduces CaLMQA, a multilingual dataset of culturally specific questions across 23 languages, and evaluates large language models' ability to generate accurate long-form answers, highlighting challenges in low-resource and culturally nuanced contexts.
Contribution
The paper presents the first dataset of culturally specific questions in 23 languages and assesses LLMs' performance on these questions, revealing significant errors especially in low-resource languages.
Findings
LLMs often make surface-level errors in many languages.
Answers to culturally specific questions have more factual errors.
Low-resource languages pose greater challenges for LLM accuracy.
Abstract
Despite rising global usage of large language models (LLMs), their ability to generate long-form answers to culturally specific questions remains unexplored in many languages. To fill this gap, we perform the first study of textual multilingual long-form QA by creating CaLMQA, a dataset of 51.7K culturally specific questions across 23 different languages. We define culturally specific questions as those that refer to concepts unique to one or a few cultures, or have different answers depending on the cultural or regional context. We obtain these questions by crawling naturally-occurring questions from community web forums in high-resource languages, and by hiring native speakers to write questions in under-resourced, rarely-studied languages such as Fijian and Kirundi. Our data collection methodologies are translation-free, enabling the collection of culturally unique questions like…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems
