MobileDev-Bench: A Benchmark for Issue Resolution in Mobile Application Development

Moshood A. Fakorede; Krishna Upadhyay; A.B. Siddique; Umar Farooq

arXiv:2603.24946·cs.SE·May 11, 2026

MobileDev-Bench: A Benchmark for Issue Resolution in Mobile Application Development

Moshood A. Fakorede, Krishna Upadhyay, A.B. Siddique, Umar Farooq

PDF

TL;DR

MobileDev-Bench is a new benchmark for evaluating AI models on real-world mobile app issue resolution, highlighting the complexity and multi-file nature of mobile development fixes.

Contribution

It introduces a comprehensive benchmark with 407 real mobile app issues, enabling automated validation and revealing the challenges faced by current LLMs.

Findings

01

LLMs achieve only 3.23%–5.69% resolution rates on mobile issues.

02

Mobile fixes are complex, averaging 12.9 files and 334.6 lines changed.

03

41% of issues require coordinated changes across multiple artifact types.

Abstract

Large language models (LLMs) have shown strong performance on automated software engineering tasks, yet existing benchmarks focus primarily on library-style repositories, leaving mobile application development largely unexplored despite its framework-specific build systems, heterogeneous artifact types, and coordinated multi-file fix requirements. We introduce MobileDev-Bench, a benchmark comprising 407 real-world issue-resolution tasks collected from 19 production mobile applications spanning Android Native (Java/Kotlin), React Native (TypeScript), and Flutter (Dart). Each task pairs a verified developer-reported issue with executable test patches, enabling fully automated validation of model-generated fixes within mobile build environments. The benchmark exhibits substantially greater patch complexity than prior benchmarks: fixes modify 12.9 files and 334.6 lines on average, and 41%…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.