TL;DR
WebTestPilot is an LLM-based web testing agent that infers implicit oracles using symbolized GUI elements and natural language specifications, significantly improving bug detection accuracy.
Contribution
It introduces a novel symbolization layer and implicit oracle inference method for end-to-end web testing with natural language specifications.
Findings
Achieves 99% task completion rate in web testing.
Attains 96% precision and recall in bug detection.
Outperforms baseline methods by +70 precision and +27 recall.
Abstract
Visual language model (VLM) agents show great promise in automating end-to-end (E2E) web testing against requirements in natural language. However, the probabilistic nature of language models can have inherent hallucinations. Therefore, given a detected inconsistency between the requirement and the web application, it is hard to distinguish whether it stems from the hallucination or a real application bug. Addressing this issue presents two core technical challenges: the implicit oracle inference challenge, where the agent must act as its own oracle to implicitly decide if the application's behavior is correct without guidance, and the probabilistic inference challenge, where an LLM's inconsistent reasoning undermines its trustworthiness as an oracle. Existing LLM-based approaches fail to capture such implicit oracles, either by treating any page navigation that doesn't crash as a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
