Language-Specific Latent Process Hinders Cross-Lingual Performance

Zheng Wei Lim; Alham Fikri Aji; Trevor Cohn

arXiv:2505.13141·cs.CL·September 29, 2025

Language-Specific Latent Process Hinders Cross-Lingual Performance

Zheng Wei Lim, Alham Fikri Aji, Trevor Cohn

PDF

Open Access 3 Reviews

TL;DR

This paper investigates why large language models struggle with consistent cross-lingual reasoning, revealing that language-specific latent processes hinder shared understanding, but steering models towards shared semantic spaces can improve multilingual performance.

Contribution

It uncovers the impact of language-specific representations on cross-lingual transfer and proposes a method to enhance multilingual reasoning by aligning latent processes.

Findings

01

Larger models rely less on shared representations but are more capable of knowledge retrieval across languages.

02

Representation dissimilarity across languages correlates with inconsistent model outputs.

03

Steering latent processing towards shared semantic space improves multilingual reasoning performance.

Abstract

Large language models (LLMs) are demonstrably capable of cross-lingual transfer, but can produce inconsistent output when prompted with the same queries written in different languages. To understand how language models are able to generalize knowledge from one language to the others, we measure representation similarity between languages, and apply the logit lens to interpret the implicit steps taken by LLMs to solve multilingual multi-choice reasoning questions. Our analyses reveal LLMs predict inconsistently and are less accurate because they rely on representations that are dissimilar across languages, rather than working in a shared semantic space. While larger models are more multilingual, we show their hidden states are more likely to dissociate from the shared representation compared to smaller models, but are nevertheless more capable of retrieving knowledge embedded across…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 2Confidence 4

Strengths

Cross-lingual consistency is a recent topic. It evaluates the fairness in LLMs, which is very important in reall applications.

Weaknesses

There are some concerns. 1. Existing work [1] overshadows the novelty and contribution of this paper. For example, this paper follows a similar experimental design to [1], including CKA and Logits Lens examinations, layer-wise analysis, and activation steering. 2. Experiments are limited, which makes the paper not conclusive. The authors only conducted experiments on multiple-choice datasets. How about generation tasks? 3. While the paper is clear, the authors spend too many spaces on introd

Reviewer 02Rating 8Confidence 3

Strengths

1. The paper provides useful insights on how knowledge is represented and shared internally across languages in LLMs. The authors investigate how LLMs transfer knowledge across languages. Through their experiments, they establish the usefulness of a shared semantic space for cross-lingual transfer. 2. The paper proposes a cross lingual steering approach to improve cross lingual transfer for smaller models. 3. Evaluation method is robust: The authors use ranking order for MCQ-styled questions

Weaknesses

1. Steering evaluation can include cross-dataset generalization to strengthen the claims. The current results only include effects on the same dataset.

Reviewer 03Rating 4Confidence 3

Strengths

- Build a framework for analysing cross-lingual transfer with well-selected metrics such as CKA, cosine-similarity and logit-lens to quantify language representation overlap, which provides a clear and interpretable way to study cross lingual transfer. - The analysis across models and layers are comprehensive. - The paper reframes multilingual reasoning as a latent-space alignment problem, providing a clear direction for multilingual output consistency.

Weaknesses

- The task is completely limited to multi-choice reasoning questions. This is a clever choice that is easy to measure the cross-lingual transfer with unambiguous labels. However, it also limits the generalisation to open-ended and generative multilingual reasoning. - “Humans have an innate ability to apply common knowledge and perform reasoning skills consistently across different languages” itself is a very contentious claim. This paper also doesn’t need Jerry Fodor’s nativism as motivation. -

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Advanced Graph Neural Networks