Do LLMs Need to Think in One Language? Correlation between Latent Language and Task Performance

Shintaro Ozaki; Tatsuya Hiraoka; Hiroto Otake; Hiroki Ouchi; Masaru Isonuma; Benjamin Heinzerling; Kentaro Inui; Taro Watanabe; Yusuke Miyao; Yohei Oseki; Yu Takagi

arXiv:2505.21458·cs.CL·May 28, 2025

Do LLMs Need to Think in One Language? Correlation between Latent Language and Task Performance

Shintaro Ozaki, Tatsuya Hiraoka, Hiroto Otake, Hiroki Ouchi, Masaru Isonuma, Benjamin Heinzerling, Kentaro Inui, Taro Watanabe, Yusuke Miyao, Yohei Oseki, Yu Takagi

PDF

Open Access

TL;DR

This paper investigates whether maintaining a consistent internal language (latent language) in LLMs improves task performance, finding that adaptation near final layers often diminishes the importance of latent language consistency.

Contribution

The study systematically analyzes the impact of latent language consistency on downstream tasks, revealing that models adapt internally, reducing the necessity for consistent latent language for optimal performance.

Findings

01

Latent language consistency does not always enhance task performance.

02

Models adapt internal representations to match target languages near the output.

03

Consistency in latent language has limited impact on translation and geo-culture tasks.

Abstract

Large Language Models (LLMs) are known to process information using a proficient internal language consistently, referred to as latent language, which may differ from the input or output languages. However, how the discrepancy between the latent language and the input and output language affects downstream task performance remains largely unexplored. While many studies research the latent language of LLMs, few address its importance in influencing task performance. In our study, we hypothesize that thinking in latent language consistently enhances downstream task performance. To validate this, our work varies the input prompt languages across multiple downstream tasks and analyzes the correlation between consistency in latent language and task performance. We create datasets consisting of questions from diverse domains such as translation and geo-culture, which are influenced by the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Artificial Intelligence in Law