CultureBank: An Online Community-Driven Knowledge Base Towards Culturally Aware Language Technologies
Weiyan Shi, Ryan Li, Yutong Zhang, Caleb Ziems, Chunhua yu, Raya, Horesh, Rog\'erio Abreu de Paula, Diyi Yang

TL;DR
CultureBank is a large, diverse cultural knowledge base derived from online communities like TikTok and Reddit, designed to improve language models' cultural awareness and grounded evaluation capabilities.
Contribution
We introduce a scalable pipeline to construct a comprehensive cultural knowledge base from online communities, enabling evaluation and fine-tuning of culturally aware language models.
Findings
Fine-tuned models perform better on cultural tasks in zero-shot settings.
CultureBank contains diverse cultural perspectives and contextual scenarios.
The pipeline enables large-scale, community-driven cultural knowledge extraction.
Abstract
To enhance language models' cultural awareness, we design a generalizable pipeline to construct cultural knowledge bases from different online communities on a massive scale. With the pipeline, we construct CultureBank, a knowledge base built upon users' self-narratives with 12K cultural descriptors sourced from TikTok and 11K from Reddit. Unlike previous cultural knowledge resources, CultureBank contains diverse views on cultural descriptors to allow flexible interpretation of cultural knowledge, and contextualized cultural scenarios to help grounded evaluation. With CultureBank, we evaluate different LLMs' cultural awareness, and identify areas for improvement. We also fine-tune a language model on CultureBank: experiments show that it achieves better performances on two downstream cultural tasks in a zero-shot setting. Finally, we offer recommendations based on our findings for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWikis in Education and Collaboration · Semantic Web and Ontologies · Natural Language Processing Techniques
MethodsBalanced Selection · Attentive Walk-Aggregating Graph Neural Network
