GAIA-v2-LILT: Multilingual Adaptation of Agent Benchmark beyond Translation

Yunsu Kim; Kaden Uhlig; Joern Wuebker

arXiv:2604.24929·cs.CL·April 29, 2026

GAIA-v2-LILT: Multilingual Adaptation of Agent Benchmark beyond Translation

Yunsu Kim, Kaden Uhlig, Joern Wuebker

PDF

1 Repo

TL;DR

This paper introduces GAIA-v2-LILT, a multilingual extension of an agent benchmark, with a refined adaptation workflow that improves cross-lingual performance measurement accuracy.

Contribution

It proposes a new workflow for adapting English benchmarks into multiple languages with explicit alignment, reducing measurement errors and improving multilingual agent evaluation.

Findings

01

Workflow improves success rates by up to 32.7% over minimal translation.

02

Brings multilingual performance closer to English, within 3.1%.

03

Substantial performance gaps remain due to benchmark-induced measurement error.

Abstract

Agent benchmarks remain largely English-centric, while their multilingual versions are often built with machine translation (MT) and limited post-editing. We argue that, for agentic tasks, this minimal workflow can easily break benchmark validity through query-answer misalignment or culturally off-target context. We propose a refined workflow for adapting English benchmarks into multiple languages with explicit functional alignment, cultural alignment, and difficulty calibration using both automated checks and human review. Using this workflow, we introduce GAIA-v2-LILT, a re-audited multilingual extension of GAIA covering five non-English languages. In experiments, our workflow improves agent success rates by up to 32.7% over minimally translated versions, bringing the closest audited setting to within 3.1% of English performance while substantial gaps remain in many other cases. This…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lilt/gaia-v2-lilt
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.