Weasel: Out-of-Domain Generalization for Web Agents via Importance-Diversity Data Selection

Fatemeh Pesaran zadeh; Seyeon Choi; Xing Han L\`u; Siva Reddy; Gunhee Kim

arXiv:2605.20291·cs.LG·May 21, 2026

Weasel: Out-of-Domain Generalization for Web Agents via Importance-Diversity Data Selection

Fatemeh Pesaran zadeh, Seyeon Choi, Xing Han L\`u, Siva Reddy, Gunhee Kim

PDF

1 Repo

TL;DR

Weasel is a trajectory selection method that enhances out-of-domain generalization for web agents by optimizing importance and diversity, improving training efficiency and performance.

Contribution

It introduces a novel importance-diversity based trajectory selection algorithm with pruning and style-matching techniques for efficient offline training of web agents.

Findings

01

Achieves 9.7-12.5× training speedups over standard fine-tuning.

02

Improves out-of-domain performance across multiple datasets and models.

03

Reduces training cost while maintaining or improving agent capabilities.

Abstract

Large language models (LLMs) have enabled web agents that follow natural language goals through multi-step browser interactions. However, agents fine-tuned on specific trajectories and domain often struggle to generalize out of domain, and offline training can be compute-inefficient due to noisy, redundant trajectories and long accessibility-tree (AXTree) states. To address both issues, we propose Weasel, a trajectory selection method for offline training of web agents. Weasel selects a fixed-budget subset of trajectory steps by optimizing an objective that balances unary importance with pairwise diversity over states, websites, and interaction patterns, solving efficiently with a greedy algorithm. We further improve efficiency with target-centered AXTree pruning that keeps only content around the ground-truth action target, and we mitigate style mismatch for reasoning-native models by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

fatemehpesaran310/weasel
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.