Multi-Agent LLM Committees for Autonomous Software Beta Testing
Sumanth Bharadwaj Hachalli Karanam, Dhiwahar Adhithya Kennady

TL;DR
This paper introduces a multi-agent LLM committee framework for autonomous software beta testing, significantly improving success rates, UI understanding, and bug detection over single-agent approaches, with real-time performance and open-source tools.
Contribution
It presents a novel multi-agent committee approach combining diverse LLMs and visual understanding for effective, real-time, autonomous software testing, outperforming single-agent baselines.
Findings
Achieves 89.5% overall task success rate in testing scenarios.
Multi-agent configurations reach up to 100% success, outperforming 78% of single-agent baselines.
Attains 0.91 F1 score in bug detection, surpassing previous methods.
Abstract
Manual software beta testing is costly and time-consuming, while single-agent large language model (LLM) approaches suffer from hallucinations and inconsistent behavior. We propose a multi-agent committee framework in which diverse vision-enabled LLMs collaborate through a three-round voting protocol to reach consensus on testing actions. The framework combines model diversity, persona-driven behavioral variation, and visual user interface understanding to systematically explore web applications. Across 84 experimental runs with 9 testing personas and 4 scenarios, multi-agent committees achieve an 89.5 percent overall task success rate. Configurations with 2 to 4 agents reach 91.7 to 100 percent success, compared to 78.0 percent for single-agent baselines, yielding improvements of 13.7 to 22.0 percentage points. At the action level, the system attains a 93.1 percent success rate with a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Testing and Debugging Techniques · Software Engineering Research · Advanced Malware Detection Techniques
