Disentangling Language and Culture for Evaluating Multilingual Large Language Models
Jiahao Ying, Wei Tang, Yiran Zhao, Yixin Cao, Yu Rong, Wenxuan Zhang

TL;DR
This paper proposes a Dual Evaluation Framework to assess multilingual LLMs by analyzing their performance across linguistic and cultural dimensions, revealing a cultural-linguistic synergy and neuron activation patterns that inform model evaluation.
Contribution
It introduces a novel framework for culturally and linguistically nuanced evaluation of multilingual LLMs, highlighting the importance of cultural context in model performance assessment.
Findings
Models perform better with culturally aligned questions.
Neuron activation correlates with cultural context in languages.
Cultural-linguistic synergy affects multilingual model performance.
Abstract
This paper introduces a Dual Evaluation Framework to comprehensively assess the multilingual capabilities of LLMs. By decomposing the evaluation along the dimensions of linguistic medium and cultural context, this framework enables a nuanced analysis of LLMs' ability to process questions within both native and cross-cultural contexts cross-lingually. Extensive evaluations are conducted on a wide range of models, revealing a notable "CulturalLinguistic Synergy" phenomenon, where models exhibit better performance when questions are culturally aligned with the language. This phenomenon is further explored through interpretability probing, which shows that a higher proportion of specific neurons are activated in a language's cultural context. This activation proportion could serve as a potential indicator for evaluating multilingual performance during model training. Our findings challenge…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
