Step-level Optimization for Efficient Computer-use Agents

Jinbiao Wei; Kangqi Ni; Yilun Zhao; Guo Gan; Arman Cohan

arXiv:2604.27151·cs.AI·May 1, 2026

Step-level Optimization for Efficient Computer-use Agents

Jinbiao Wei, Kangqi Ni, Yilun Zhao, Guo Gan, Arman Cohan

PDF

1 Repo

TL;DR

This paper introduces an event-driven, step-level optimization framework for computer-use agents that adaptively allocates computational resources by escalating from small policies to large models only when necessary, improving efficiency.

Contribution

It proposes a modular, on-demand compute allocation method using risk monitors to enhance efficiency without retraining existing agents.

Findings

01

Reduces unnecessary large model calls in GUI tasks.

02

Detects and recovers from progress stalls and semantic drift.

03

Maintains performance while lowering computational costs.

Abstract

Computer-use agents provide a promising path toward general software automation because they can interact directly with arbitrary graphical user interfaces instead of relying on brittle, application-specific integrations. Despite recent advances in benchmark performance, strong computer-use agents remain expensive and slow in practice, since most systems invoke large multimodal models at nearly every interaction step. We argue that this uniform allocation of compute is fundamentally inefficient for long-horizon GUI tasks. Such trajectories are highly heterogeneous: many steps are routine and can be handled reliably by smaller, cheaper policies, while errors tend to concentrate at a relatively small number of high-risk moments. Across computer-use benchmarks, these failures repeatedly take two forms: progress stalls, where the agent loops, repeats ineffective actions, or fails to make…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yale-nlp/StepWise
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.