CLEAR: Cross-Lingual Enhancement in Alignment via Reverse-training

Seungyoon Lee; Minhyuk Kim; Seongtae Hong; Youngjoon Jang; Dongsuk Oh; Heuiseok Lim

arXiv:2604.05821·cs.CL·April 15, 2026

CLEAR: Cross-Lingual Enhancement in Alignment via Reverse-training

Seungyoon Lee, Minhyuk Kim, Seongtae Hong, Youngjoon Jang, Dongsuk Oh, Heuiseok Lim

PDF

1 Repo

TL;DR

CLEAR introduces a reverse-training loss function that enhances cross-lingual retrieval by leveraging English as a bridge, significantly improving performance especially in low-resource languages.

Contribution

The paper proposes a novel reverse-training scheme that improves cross-lingual alignment and retrieval performance, addressing limitations of contrastive learning methods.

Findings

01

Achieves up to 15% improvement in cross-lingual retrieval tasks.

02

Enhances performance in low-resource languages without degrading English results.

03

Effective even in multilingual training scenarios.

Abstract

Existing multilingual embedding models often encounter challenges in cross-lingual scenarios due to imbalanced linguistic resources and less consideration of cross-lingual alignment during training. Although standardized contrastive learning approaches for cross-lingual adaptation are widely adopted, they may struggle to capture fundamental alignment between languages and degrade performance in well-aligned languages such as English. To address these challenges, we propose Cross-Lingual Enhancement in Retrieval via Reverse-training (CLEAR), a novel loss function utilizing a reverse training scheme to improve retrieval performance across diverse cross-lingual retrieval scenarios. CLEAR leverages an English passage as a bridge to strengthen alignments between the target language and English, ensuring robust performance in the cross-lingual retrieval task. Our extensive experiments…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dltmddbs100/CLEAR
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.