Optimizing Language Models for Crosslingual Knowledge Consistency

Tianyu Liu; Jirui Qi; Mrinmaya Sachan; Ryan Cotterell; Raquel Fern\'andez; Arianna Bisazza

arXiv:2603.04678·cs.CL·May 11, 2026

Optimizing Language Models for Crosslingual Knowledge Consistency

Tianyu Liu, Jirui Qi, Mrinmaya Sachan, Ryan Cotterell, Raquel Fern\'andez, Arianna Bisazza

PDF

1 Repo

TL;DR

This paper introduces DCO, a reinforcement learning method that enhances crosslingual knowledge consistency in multilingual large language models without requiring explicit reward models.

Contribution

The paper proposes DCO, a novel, reward-model-free approach derived from the LLM itself, to improve multilingual knowledge consistency.

Findings

01

DCO significantly improves crosslingual consistency across diverse LLMs.

02

DCO outperforms existing methods when trained with multiple languages.

03

DCO demonstrates effectiveness in bilingual settings and out-of-domain scenarios.

Abstract

Large language models are known to often exhibit inconsistent knowledge. This is particularly problematic in multilingual scenarios, where models are likely to be asked similar questions in different languages, and inconsistent responses can undermine their reliability. In this work, we show that this issue can be mitigated using reinforcement learning with a structured reward function, which leads to an optimal policy with consistent crosslingual responses. We introduce Direct Consistency Optimization (DCO), a DPO-inspired method that requires no explicit reward model and is derived directly from the LLM itself. Comprehensive experiments show that DCO significantly improves crosslingual consistency across diverse LLMs and outperforms existing methods when training with samples of multiple languages, while complementing DPO when gold labels are available. Extra experiments demonstrate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Betswish/ConsistencyRL
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.