Benchmarking Cross-Lingual Semantic Alignment in Multilingual Embeddings

Wen G. Gong

arXiv:2601.09732·cs.CL·January 16, 2026

Benchmarking Cross-Lingual Semantic Alignment in Multilingual Embeddings

Wen G. Gong

PDF

Open Access

TL;DR

This paper introduces Semantic Affinity, a new metric for evaluating cross-lingual semantic alignment in multilingual embeddings, revealing that explicit translation supervision is essential for high-quality alignment, regardless of model scale or architecture.

Contribution

The paper presents Semantic Affinity and Semanscope, providing a new benchmark for assessing true cross-lingual semantic alignment in multilingual models.

Findings

01

Top BERT models achieve strong alignment with translation supervision.

02

LLM embeddings plateau at moderate alignment levels regardless of size.

03

MLM-only models fail to align despite extensive multilingual training.

Abstract

With hundreds of multilingual embedding models available, practitioners lack clear guidance on which provide genuine cross-lingual semantic alignment versus task performance through language-specific patterns. Task-driven benchmarks (MTEB) may mask fundamental alignment shortcomings. We introduce Semantic Affinity (SA), a bounded (between 0 and 1) metric measuring inter-lingual to intra-lingual spread ratio using cosine distance, combined with PHATE visualization in our Semanscope framework. Benchmarking 13 models across 4 datasets (52 experiments) reveals a three-tier structure: (1) Top BERT models (LaBSE SA = 0.70, USE SA = 0.68, S-BERT SA = 0.68) achieve strong alignment via translation-pair supervision; (2) LLM embeddings plateau at SA between 0.55 and 0.61 regardless of 0.6 B to 8 B scale; (3) MLM-only BERT models (mBERT, XLM-R, SA < 0.50) fail despite more than 100 language…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Language and cultural evolution · Natural Language Processing Techniques