Beneath the Surface of Consistency: Exploring Cross-lingual Knowledge   Representation Sharing in LLMs

Maxim Ifergan; Leshem Choshen; Roee Aharoni; Idan Szpektor; Omri Abend

arXiv:2408.10646·cs.CL·August 21, 2024

Beneath the Surface of Consistency: Exploring Cross-lingual Knowledge Representation Sharing in LLMs

Maxim Ifergan, Leshem Choshen, Roee Aharoni, Idan Szpektor, Omri Abend

PDF

Open Access

TL;DR

This paper investigates how multilingual large language models represent and share factual knowledge across languages, revealing that script similarity influences sharing and proposing methods to measure and improve this capability.

Contribution

The study introduces a novel methodology to measure cross-lingual knowledge sharing in LLMs and provides empirical insights into factors affecting this sharing, such as script similarity.

Findings

01

High consistency does not guarantee shared representation.

02

Script similarity significantly influences knowledge sharing.

03

Full sharing could improve accuracy by up to 150%."

Abstract

The veracity of a factoid is largely independent of the language it is written in. However, language models are inconsistent in their ability to answer the same factual question across languages. This raises questions about how LLMs represent a given fact across languages. We explore multilingual factual knowledge through two aspects: the model's ability to answer a query consistently across languages, and the ability to ''store'' answers in a shared representation for several languages. We propose a methodology to measure the extent of representation sharing across languages by repurposing knowledge editing methods. We examine LLMs with various multilingual configurations using a new multilingual dataset. We reveal that high consistency does not necessarily imply shared representation, particularly for languages with different scripts. Moreover, we find that script similarity is a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies