SteP: Stacked LLM Policies for Web Actions
Paloma Sodhi, S.R.K. Branavan, Yoav Artzi, Ryan McDonald

TL;DR
SteP introduces a dynamic, stack-based policy composition method for large language models to effectively perform diverse web tasks, improving performance over state-of-the-art approaches.
Contribution
This paper presents SteP, a novel approach that enables dynamic control of multiple policies in web tasks, surpassing static hierarchy methods.
Findings
SteP improves WebArena performance by 14.9% to 33.5% over SOTA.
SteP is competitive on MiniWoB++ with less data.
Code and data are publicly available.
Abstract
Performing tasks on the web presents fundamental challenges to large language models (LLMs), including combinatorially large open-world tasks and variations across web interfaces. Simply specifying a large prompt to handle all possible behaviors and states is extremely complex, and results in behavior leaks between unrelated behaviors. Decomposition to distinct policies can address this challenge, but requires carefully handing off control between policies. We propose Stacked LLM Policies for Web Actions (SteP), an approach to dynamically compose policies to solve a diverse set of web tasks. SteP defines a Markov Decision Process where the state is a stack of policies representing the control state, i.e., the chain of policy calls. Unlike traditional methods that are restricted to static hierarchies, SteP enables dynamic control that adapts to the complexity of the task. We evaluate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
MethodsAttention Is All You Need · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Absolute Position Encodings · Dropout · Dense Connections · Label Smoothing · Residual Connection · Softmax · Adam
