WebSTAR: Scalable Data Synthesis for Computer Use Agents with Step-Level Filtering

Yifei He; Pranit Chawla; Yaser Souri; Subhojit Som; Xia Song

arXiv:2512.10962·cs.LG·February 6, 2026

WebSTAR: Scalable Data Synthesis for Computer Use Agents with Step-Level Filtering

Yifei He, Pranit Chawla, Yaser Souri, Subhojit Som, Xia Song

PDF

Open Access 1 Datasets

TL;DR

This paper introduces a scalable data synthesis pipeline with step-level filtering to generate high-quality training data for computer use agents, significantly improving their performance and robustness.

Contribution

The paper presents a novel step-level filtering method for synthesizing reliable training data from noisy model rollouts, enabling scalable training of computer use agents.

Findings

01

WebSTAR dataset with 13.3K trajectories and 267K steps created.

02

7B model trained on WebSTAR surpasses state-of-the-art open-source models by over 15%.

03

WebSCORE and StepRM provide efficient, high-quality step-level grading and reward modeling.

Abstract

Computer use agents (CUAs) can operate real-world digital interfaces but remain difficult to train due to the high cost of graphical user interface (GUI) interaction and the scarcity of high-quality trajectory data. Existing datasets rely on human demonstrations, limiting scalability. A natural alternative is to synthesize data from strong CUAs, yet their rollouts are highly noisy, with incorrect or suboptimal actions consisting a large proportion of the steps, making naive imitation ineffective. To tackle this challenge, we introduce a scalable data synthesis pipeline that transforms noisy rollouts into reliable supervision without human annotation. The core idea is step-level filtering, which evaluates actions individually to retain only correct steps, complemented by reasoning augmentation for improved planning. Using this pipeline, we construct WebSTAR, a dataset of 13.3K…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

microsoft/WebSTAR
dataset· 666 dl
666 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Social Robot Interaction and HRI · Artificial Intelligence in Games