Cross-Lingual LLM-Judge Transfer via Evaluation Decomposition
Ivaxi Sheth, Zeno Jonke, Amin Mantrach, Saab Mansour

TL;DR
This paper presents a universal, language-agnostic evaluation framework for cross-lingual assessment of large language models, enabling effective transfer without extensive target-language annotations.
Contribution
It introduces a decomposition-based evaluation method utilizing a Universal Criteria Set (UCS) for cross-lingual transfer with minimal supervision.
Findings
Consistent improvements over baselines across multiple languages.
Effective transfer without target-language annotations.
Supports diverse faithfulness evaluation tasks.
Abstract
As large language models are increasingly deployed across diverse real-world applications, extending automated evaluation beyond English has become a critical challenge. Existing evaluation approaches are predominantly English-focused, and adapting them to other languages is hindered by the scarcity and cost of human-annotated judgments in most languages. We introduce a decomposition-based evaluation framework built around a Universal Criteria Set (UCS). UCS consists of a shared, language-agnostic set of evaluation dimensions, producing an interpretable intermediate representation that supports cross-lingual transfer with minimal supervision. Experiments on multiple faithfulness tasks across languages and model backbones demonstrate consistent improvements over strong baselines without requiring target-language annotations.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
