Evaluating Cultural Knowledge Processing in Large Language Models: A Cognitive Benchmarking Framework Integrating Retrieval-Augmented Generation
Hung-Shin Lee, Chen-Chi Chang, Ching-Yuan Chen, Yun-Hsiang Hsu

TL;DR
This paper introduces a cognitive benchmarking framework combining Bloom's Taxonomy and Retrieval-Augmented Generation to evaluate large language models' processing of culturally specific knowledge, demonstrated on Taiwanese Hakka cultural data.
Contribution
It presents a novel framework for assessing LLMs' cultural knowledge processing across hierarchical cognitive domains using an integrated approach.
Findings
Framework effectively measures semantic accuracy and cultural relevance.
Evaluation reveals strengths and limitations of LLMs in cultural knowledge tasks.
Benchmarking on Taiwanese Hakka data demonstrates practical applicability.
Abstract
This study proposes a cognitive benchmarking framework to evaluate how large language models (LLMs) process and apply culturally specific knowledge. The framework integrates Bloom's Taxonomy with Retrieval-Augmented Generation (RAG) to assess model performance across six hierarchical cognitive domains: Remembering, Understanding, Applying, Analyzing, Evaluating, and Creating. Using a curated Taiwanese Hakka digital cultural archive as the primary testbed, the evaluation measures LLM-generated responses' semantic accuracy and cultural relevance.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational and Text Analysis Methods · Digital Humanities and Scholarship · Language and cultural evolution
