Agentic Compilation: Mitigating the LLM Rerun Crisis for Minimized-Inference-Cost Web Automation
Jagadeesh Chundru

TL;DR
This paper introduces a compilation-based approach to reduce inference costs in LLM-driven web automation, enabling scalable, cost-effective, and reliable browser task execution.
Contribution
The authors propose a Compile-and-Execute architecture that decouples reasoning from execution, drastically lowering inference costs and improving reliability in web automation tasks.
Findings
Inference costs reduced from hundreds of dollars to under 0.10 USD per workflow.
Achieved 80-94% success rates in zero-shot compilation across various tasks.
Near-100% execution reliability with minimal human patching.
Abstract
LLM-driven web agents operating through continuous inference loops -- repeatedly querying a model to evaluate browser state and select actions -- exhibit a fundamental scalability constraint for repetitive tasks. We characterize this as the Rerun Crisis: the linear growth of token expenditure and API latency relative to execution frequency. For a 5-step workflow over 500 iterations, a continuous agent incurs approximately 150.00 USD in inference costs; even with aggressive caching, this remains near 15.00 USD. We propose a Compile-and-Execute architecture that decouples LLM reasoning from browser execution, reducing per-workflow inference cost to under 0.10 USD. A one-shot LLM invocation processes a token-efficient semantic representation from a DOM Sanitization Module (DSM) and emits a deterministic JSON workflow blueprint. A lightweight runtime then drives the browser without further…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
