Autonomous QA Agent: A Retrieval-Augmented Framework for Reliable Selenium Script Generation
Dudekula Kasim Vali

TL;DR
The paper introduces the Autonomous QA Agent, a retrieval-augmented system that improves Selenium script generation accuracy by grounding it in project-specific documentation and HTML structure, reducing hallucinations.
Contribution
It presents a novel RAG framework for UI test script generation that significantly enhances validity and execution success over standard LLM approaches.
Findings
Achieved 100% syntax validity in generated scripts.
Reached 90% execution success rate, outperforming standard LLMs.
Grounding in DOM structure reduces hallucinations in script generation.
Abstract
Software testing is critical in the software development lifecycle, yet translating requirements into executable test scripts remains manual and error-prone. While Large Language Models (LLMs) can generate code, they often hallucinate non-existent UI elements. We present the Autonomous QA Agent, a Retrieval-Augmented Generation (RAG) system that grounds Selenium script generation in project-specific documentation and HTML structure. By ingesting diverse formats (Markdown, PDF, HTML) into a vector database, our system retrieves relevant context before generation. Evaluation on 20 e-commerce test scenarios shows our RAG approach achieves 100% (20/20) syntax validity and 90% (18/20, 95% CI: [85%, 95%], p < 0.001) execution success, compared to 30% for standard LLM generation. While our evaluation is limited to a single domain, our method significantly reduces hallucinations by grounding…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
