ScribeAgent: Towards Specialized Web Agents Using Production-Scale Workflow Data
Junhong Shen, Atishay Jain, Zedian Xiao, Ishan Amlekar, Mouad Hadji,, Aaron Podolny, Ameet Talwalkar

TL;DR
ScribeAgent fine-tunes open-source LLMs with large-scale web workflow data, significantly enhancing web task performance and surpassing prompting-based models on benchmarks.
Contribution
The paper introduces a fine-tuning approach using production-scale workflow data to improve specialized web agents, achieving state-of-the-art results.
Findings
ScribeAgent outperforms prompting-based agents on benchmarks.
Fine-tuning with large-scale data improves web task success rates.
Detailed ablations provide insights into optimal training strategies.
Abstract
Large Language Model (LLM) agents are rapidly improving to handle increasingly complex web-based tasks. Most of these agents rely on general-purpose, proprietary models like GPT-4 and focus on designing better prompts to improve their planning abilities. However, general-purpose LLMs are not specifically trained to understand specialized web contexts such as HTML, and they often struggle with long-horizon planning. We explore an alternative approach that fine-tunes open-source LLMs using production-scale workflow data collected from over 250 domains corresponding to 6 billion tokens. This simple yet effective approach shows substantial gains over prompting-based agents on existing benchmarks -- ScribeAgent achieves state-of-the-art direct generation performance on Mind2Web and improves the task success rate by 7.3% over the previous best text-only web agents on WebArena. We further…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Business Process Modeling and Analysis · Multi-Agent Systems and Negotiation
MethodsAttention Is All You Need · Dense Connections · Label Smoothing · Dropout · Linear Layer · Layer Normalization · Byte Pair Encoding · Adam · Residual Connection · Softmax
