Culture Cartography: Mapping the Landscape of Cultural Knowledge
Caleb Ziems, William Held, Jane Yu, Amir Goldberg, David Grusky, Diyi Yang

TL;DR
This paper introduces CultureCartography, a mixed-initiative approach for mapping culture-specific knowledge in LLMs, enabling more accurate and culturally aware language models through human-LLM collaboration.
Contribution
It proposes a novel mixed-initiative methodology and tool, CultureExplorer, to identify and fill gaps in LLMs' cultural knowledge, improving model performance on cultural benchmarks.
Findings
CultureExplorer more effectively identifies missing cultural knowledge than baseline methods.
Fine-tuning Llama-3.1-8B on the collected data improves accuracy by up to 19.2%.
The approach enhances LLMs' cultural awareness even with web search capabilities.
Abstract
To serve global users safely and productively, LLMs need culture-specific knowledge that might not be learned during pre-training. How do we find such knowledge that is (1) salient to in-group users, but (2) unknown to LLMs? The most common solutions are single-initiative: either researchers define challenging questions that users passively answer (traditional annotation), or users actively produce data that researchers structure as benchmarks (knowledge extraction). The process would benefit from mixed-initiative collaboration, where users guide the process to meaningfully reflect their cultures, and LLMs steer the process towards more challenging questions that meet the researcher's goals. We propose a mixed-initiative methodology called CultureCartography. Here, an LLM initializes annotation with questions for which it has low-confidence answers, making explicit both its prior…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsInformation Retrieval and Search Behavior · Topic Modeling · Expert finding and Q&A systems
