Multi-Agent LLM Committees for Autonomous Software Beta Testing

Sumanth Bharadwaj Hachalli Karanam; Dhiwahar Adhithya Kennady

arXiv:2512.21352·cs.SE·December 29, 2025

Multi-Agent LLM Committees for Autonomous Software Beta Testing

Sumanth Bharadwaj Hachalli Karanam, Dhiwahar Adhithya Kennady

PDF

Open Access

TL;DR

This paper introduces a multi-agent LLM committee framework for autonomous software beta testing, significantly improving success rates, UI understanding, and bug detection over single-agent approaches, with real-time performance and open-source tools.

Contribution

It presents a novel multi-agent committee approach combining diverse LLMs and visual understanding for effective, real-time, autonomous software testing, outperforming single-agent baselines.

Findings

01

Achieves 89.5% overall task success rate in testing scenarios.

02

Multi-agent configurations reach up to 100% success, outperforming 78% of single-agent baselines.

03

Attains 0.91 F1 score in bug detection, surpassing previous methods.

Abstract

Manual software beta testing is costly and time-consuming, while single-agent large language model (LLM) approaches suffer from hallucinations and inconsistent behavior. We propose a multi-agent committee framework in which diverse vision-enabled LLMs collaborate through a three-round voting protocol to reach consensus on testing actions. The framework combines model diversity, persona-driven behavioral variation, and visual user interface understanding to systematically explore web applications. Across 84 experimental runs with 9 testing personas and 4 scenarios, multi-agent committees achieve an 89.5 percent overall task success rate. Configurations with 2 to 4 agents reach 91.7 to 100 percent success, compared to 78.0 percent for single-agent baselines, yielding improvements of 13.7 to 22.0 percentage points. At the action level, the system attains a 93.1 percent success rate with a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Testing and Debugging Techniques · Software Engineering Research · Advanced Malware Detection Techniques