MPR-GUI: Benchmarking and Enhancing Multilingual Perception and Reasoning in GUI Agents
Ruihan Chen, Qiming Li, Xiaocheng Feng, Weihong Zhong, Xiaoliang Yang, Yuxuan Gu, Zekun Zhou, Yunfei Lu, Haoyu Ren, Kun Chen, Dandan Tu, Bing Qin

TL;DR
This paper introduces MPR-GUI-Bench, a benchmark for multilingual GUI perception and reasoning, and proposes GUI-XLI, a method to improve cross-lingual performance by aligning hidden states across languages.
Contribution
The paper presents a new multilingual benchmark with fine-grained diagnostics and an intervention method to enhance cross-lingual GUI agent capabilities.
Findings
Identified consistent perception and reasoning gaps between English and non-English settings.
Proposed GUI-XLI reduces cross-lingual performance gaps by an average of 6.5%.
Benchmark reveals reasoning-intensive tasks are particularly challenging in non-English languages.
Abstract
Large Vision-Language Models (LVLMs) have shown strong potential as multilingual Graphical User Interface (GUI) agents, as evidenced by existing GUI benchmarks. However, these benchmarks exhibit two primary limitations: (1) although Perception and Reasoning (P&R) capabilities are fundamental for GUI agents, current benchmarks lack fine-grained diagnostics to identify which specific capabilities lead to task failures, hindering targeted improvements; (2) existing benchmarks fail to provide a strictly aligned cross-lingual evaluation environment, introducing confounding factors that prevent isolating the language impact on GUI agent performance. To address these issues, we propose the Multilingual P&R GUI Benchmark (MPR-GUI-Bench), featuring strictly aligned environments across six languages and eight fine-grained P&R tasks. Our benchmark reveals consistent P&R gaps between English and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
