CLM-Bench: Benchmarking and Analyzing Cross-lingual Misalignment of LLMs in Knowledge Editing
Yucheng Hu, Wei Zhou, Juesi Xiao

TL;DR
This paper introduces CLM-Bench, a culturally native Chinese-centric benchmark for evaluating multilingual knowledge editing in LLMs, revealing significant cross-lingual misalignment and limitations of current editing methods.
Contribution
We propose CLM-Bench, a culturally aware benchmark built with native Chinese data, and analyze cross-lingual misalignment in LLM knowledge editing through geometric layer-wise representation analysis.
Findings
Significant cross-lingual misalignment in knowledge edits.
Layer-wise representations for different languages are nearly orthogonal.
Mixed-lingual editing shows linear additivity of edit vectors.
Abstract
Knowledge Editing (KE) has emerged as a promising paradigm for updating facts in Large Language Models (LLMs) without retraining. However, progress in Multilingual Knowledge Editing (MKE) is currently hindered by biased evaluation frameworks. We observe that existing MKE benchmarks are typically constructed by mechanically translating English-centric datasets into target languages (e.g., English-to-Chinese). This approach introduces translation artifacts and neglects culturally specific entities native to the target language, failing to reflect the true knowledge distribution of LLMs. To address this, we propose CLM-Bench, a culture-aware benchmark constructed using a native Chinese-first methodology. We curate 1,010 high-quality CounterFact pairs rooted in Chinese cultural contexts and align them with English counterparts. Using CLM-Bench, we conduct extensive experiments on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Advanced Graph Neural Networks · Natural Language Processing Techniques
