Cybernaut: Towards Reliable Web Automation

Ankur Tomar; Hengyue Liang; Indranil Bhattacharya; Natalia Larios; Francesco Carbone

arXiv:2508.16688·cs.SE·August 26, 2025

Cybernaut: Towards Reliable Web Automation

Ankur Tomar, Hengyue Liang, Indranil Bhattacharya, Natalia Larios, Francesco Carbone

PDF

TL;DR

Cybernaut is a framework that enhances the reliability and consistency of AI-driven web automation in complex enterprise environments by introducing SOP generation, high-precision element recognition, and a new performance metric.

Contribution

It introduces a comprehensive framework with SOP generation, precise DOM recognition, and a new metric, addressing challenges in automating complex internal web interfaces.

Findings

01

23.2% improvement in task success rate

02

84.7% accuracy in identifying consistent execution patterns

03

Effective in enterprise-scale web automation

Abstract

The emergence of AI-driven web automation through Large Language Models (LLMs) offers unprecedented opportunities for optimizing digital workflows. However, deploying such systems within industry's real-world environments presents four core challenges: (1) ensuring consistent execution, (2) accurately identifying critical HTML elements, (3) meeting human-like accuracy in order to automate operations at scale and (4) the lack of comprehensive benchmarking data on internal web applications. Existing solutions are primarily tailored for well-designed, consumer-facing websites (e.g., Amazon.com, Apple.com) and fall short in addressing the complexity of poorly-designed internal web interfaces. To address these limitations, we present Cybernaut, a novel framework to ensure high execution consistency in web automation agents designed for robust enterprise use. Our contributions are threefold:…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.