Cross-Lingual Response Consistency in Large Language Models: An ILR-Informed Evaluation of Claude Across Six Languages

Camelia Baluta

arXiv:2604.27137·cs.CL·May 1, 2026

Cross-Lingual Response Consistency in Large Language Models: An ILR-Informed Evaluation of Claude Across Six Languages

Camelia Baluta

PDF

TL;DR

This study evaluates Claude's responses across six languages using ILR standards, revealing significant cross-lingual variations in length, style, and cultural content through combined automated and expert assessments.

Contribution

It introduces an ILR-based evaluation framework for multilingual LLMs, combining quantitative metrics with expert qualitative analysis to understand cross-lingual response differences.

Findings

01

French responses are 30% longer than German responses.

02

Creative responses show the highest surface divergence across languages.

03

Expert analysis identified five patterns of cross-lingual variation.

Abstract

This paper introduces a systematic evaluation framework grounded in the Interagency Language Roundtable (ILR) Skill Level Descriptions and applies it to Claude (Sonnet 4.6) across six languages: English, French, Romanian, Spanish, Italian, and German. We administer a battery of 12 semantically equivalent prompt clusters spanning ILR complexity levels 1 through 3+, collect 216 responses (12 prompts, 6 languages, 3 runs), and analyze outputs through a two-layer methodology combining automated quantitative metrics with expert ILR qualitative assessment. Quantitative analysis reveals that French responses are approximately 30% longer than German responses on identical prompts, and that creative and affective clusters show the highest cross-lingual surface divergence. Qualitative analysis, conducted by a six-language professional with 12 years of ILR/OPI assessment experience, identifies…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.