TL;DR
Weasel is a trajectory selection method that enhances out-of-domain generalization for web agents by optimizing importance and diversity, improving training efficiency and performance.
Contribution
It introduces a novel importance-diversity based trajectory selection algorithm with pruning and style-matching techniques for efficient offline training of web agents.
Findings
Achieves 9.7-12.5× training speedups over standard fine-tuning.
Improves out-of-domain performance across multiple datasets and models.
Reduces training cost while maintaining or improving agent capabilities.
Abstract
Large language models (LLMs) have enabled web agents that follow natural language goals through multi-step browser interactions. However, agents fine-tuned on specific trajectories and domain often struggle to generalize out of domain, and offline training can be compute-inefficient due to noisy, redundant trajectories and long accessibility-tree (AXTree) states. To address both issues, we propose Weasel, a trajectory selection method for offline training of web agents. Weasel selects a fixed-budget subset of trajectory steps by optimizing an objective that balances unary importance with pairwise diversity over states, websites, and interaction patterns, solving efficiently with a greedy algorithm. We further improve efficiency with target-centered AXTree pruning that keeps only content around the ground-truth action target, and we mitigate style mismatch for reasoning-native models by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
