Cerberus: Multi-Agent Reasoning and Coverage-Guided Exploration for Static Detection of Runtime Errors
Hridya Dhulipala, Xiaokai Rong, Tien N. Nguyen

TL;DR
Cerberus is a novel execution-free testing framework that uses large language models to predict code coverage and detect runtime errors in code snippets, improving error detection efficiency.
Contribution
It introduces a two-phase feedback loop leveraging LLMs for coverage-guided testing without executing code, enhancing runtime error detection in incomplete snippets.
Findings
Outperforms traditional testing methods in error detection.
Generates high-coverage test cases efficiently.
Discovers more runtime errors in code snippets.
Abstract
In several software development scenarios, it is desirable to detect runtime errors and exceptions in code snippets without actual execution. A typical example is to detect runtime exceptions in online code snippets before integrating them into a codebase. In this paper, we propose Cerberus, a novel predictive, execution-free coverage-guided testing framework. Cerberus uses LLMs to generate the inputs that trigger runtime errors and to perform code coverage prediction and error detection without code execution. With a two-phase feedback loop, Cerberus first aims to both increasing code coverage and detecting runtime errors, then shifts to focus only detecting runtime errors when the coverage reaches 100% or its maximum, enabling it to perform better than prompting the LLMs for both purposes. Our empirical evaluation demonstrates that Cerberus performs better than conventional and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Testing and Debugging Techniques · Software Engineering Research · Software System Performance and Reliability
