Automating Detection and Root-Cause Analysis of Flaky Tests in Quantum Software
Janakan Sivaloganathan, Ainaz Jamshidi, Andriy Miranskyy, Lei Zhang

TL;DR
This paper introduces an automated pipeline utilizing Large Language Models to detect and analyze flaky tests in quantum software, significantly expanding datasets and improving diagnosis accuracy.
Contribution
It automates flaky test discovery and root-cause analysis in quantum software using LLMs, achieving high classification performance and expanding existing datasets.
Findings
Identified 25 new flaky tests, increasing dataset size by 54%.
Google Gemini achieved F1-scores of 0.9420 for detection and 0.9643 for root-cause identification.
Demonstrated LLMs' practical utility in quantum software testing.
Abstract
Like classical software, quantum software systems rely on automated testing. However, their inherently probabilistic outputs make them susceptible to quantum flakiness -- tests that pass or fail inconsistently without code changes. Such quantum flaky tests can mask real defects and reduce developer productivity, yet systematic tooling for their detection and diagnosis remains limited. This paper presents an automated pipeline to detect flaky-test-related issues and pull requests in quantum software repositories and to support the identification of their root causes. We aim to expand an existing quantum flaky test dataset and evaluate the capability of Large Language Models (LLMs) for flakiness classification and root-cause identification. Building on a prior manual analysis of 14 quantum software repositories, we automate the discovery of additional flaky test cases using LLMs and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsQuantum Computing Algorithms and Architecture · Software System Performance and Reliability · Software Engineering Research
