Beyond ChatGPT: Enhancing Software Quality Assurance Tasks with Diverse LLMs and Validation Techniques
Ratnadira Widyasari, David Lo, Lizi Liao

TL;DR
This paper evaluates various large language models beyond ChatGPT for software quality assurance tasks, demonstrating that combining multiple models and validation techniques significantly improves fault localization and vulnerability detection performance.
Contribution
It provides a comprehensive comparison of multiple LLMs in SQA tasks and introduces novel ensemble and validation methods to enhance their effectiveness.
Findings
Several LLMs outperform GPT-3.5 in SQA tasks.
Combining LLM results via voting improves performance by over 10%.
Cross-validation of LLM answers yields up to 16% improvement.
Abstract
With the advancement of Large Language Models (LLMs), their application in Software Quality Assurance (SQA) has increased. However, the current focus of these applications is predominantly on ChatGPT. There remains a gap in understanding the performance of various LLMs in this critical domain. This paper aims to address this gap by conducting a comprehensive investigation into the capabilities of several LLMs across two SQA tasks: fault localization and vulnerability detection. We conducted comparative studies using GPT-3.5, GPT-4o, and four other publicly available LLMs (LLaMA-3-70B, LLaMA-3-8B, Gemma-7B, and Mixtral-8x7B), to evaluate their effectiveness in these tasks. Our findings reveal that several LLMs can outperform GPT-3.5 in both tasks. Additionally, even the lower-performing LLMs provided unique correct predictions, suggesting the potential of combining different LLMs'…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Scientific Computing and Data Management
