Do Multi-Agents Dream of Electric Screens? Achieving Perfect Accuracy on AndroidWorld Through Task Decomposition
Pierre-Louis Favreau, Jean-Pierre Lo, Clement Guiguet, Charles Simon-Meunier, Nicolas Dehandschoewercker, Allen G. Roush, Judah Goldfeder, Ravid Shwartz-Ziv

TL;DR
Minitap is a multi-agent system that fully solves all tasks in the AndroidWorld benchmark by employing task decomposition, validation, and meta-cognitive strategies, surpassing human performance.
Contribution
The paper introduces Minitap, a novel multi-agent architecture with specialized mechanisms that achieves perfect accuracy on AndroidWorld, outperforming prior single-agent approaches.
Findings
Achieved 100% success on AndroidWorld benchmark
Multi-agent decomposition improves performance by +21 points
Meta-cognitive reasoning adds +9 points to success rate
Abstract
We present Minitap, a multi-agent system that achieves 100% success on the AndroidWorld benchmark, the first to fully solve all 116 tasks and surpassing human performance (80%). We first analyze why single-agent architectures fail: context pollution from mixed reasoning traces, silent text input failures undetected by the agent, and repetitive action loops without escape. Minitap addresses each failure through targeted mechanisms: cognitive separation across six specialized agents, deterministic post-validation of text input against device state, and meta-cognitive reasoning that detects cycles and triggers strategy changes. Ablations show multi-agent decomposition contributes +21 points over single-agent baselines; verified execution adds +7 points; meta-cognition adds +9 points. We release Minitap as open-source software. https://github.com/minitap-ai/mobile-use
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPersonal Information Management and User Behavior · Context-Aware Activity Recognition Systems · Green IT and Sustainability
