Surfer 2: The Next Generation of Cross-Platform Computer Use Agents
Mathieu Andreux, M\"art Bakler, Yanael Barbier, Hamza Benchekroun, Emilien Bir\'e, Antoine Bonnet, Riaz Bordie, Nathan Bout, Matthias Brunel, Aleix Cambray, Pierre-Louis Cedoz, Antoine Chassang, Gautier Cloix, Ethan Connelly, Alexandra Constantinou, Ramzi De Coster

TL;DR
Surfer 2 is a unified, visual observation-based agent architecture that achieves state-of-the-art cross-platform performance in web, desktop, and mobile environments, surpassing prior systems and even human benchmarks.
Contribution
It introduces Surfer 2, a novel architecture integrating hierarchical context, planning, execution, and self-verification for reliable long-horizon tasks across platforms.
Findings
Achieves 97.1% accuracy on WebVoyager
Outperforms all prior systems without fine-tuning
Exceeds human performance with multiple attempts
Abstract
Building agents that generalize across web, desktop, and mobile environments remains an open challenge, as prior systems rely on environment-specific interfaces that limit cross-platform deployment. We introduce Surfer 2, a unified architecture operating purely from visual observations that achieves state-of-the-art performance across all three environments. Surfer 2 integrates hierarchical context management, decoupled planning and execution, and self-verification with adaptive recovery, enabling reliable operation over long task horizons. Our system achieves 97.1% accuracy on WebVoyager, 69.6% on WebArena, 60.1% on OSWorld, and 87.1% on AndroidWorld, outperforming all prior systems without task-specific fine-tuning. With multiple attempts, Surfer 2 exceeds human performance on all benchmarks. These results demonstrate that systematic orchestration amplifies foundation model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Software Engineering Methodologies · Context-Aware Activity Recognition Systems
