ReFuzzer: Feedback-Driven Approach to Enhance Validity of LLM-Generated Test Programs
Iti Shree, Karine Even-Mendoza, Tomasz Radzik

TL;DR
ReFuzzer is a feedback-driven framework that refines LLM-generated test programs by detecting and correcting errors, significantly increasing their validity and coverage in compiler testing.
Contribution
It introduces a novel feedback loop with an LLM to improve the validity of generated test programs, enhancing fuzzing effectiveness beyond crash detection.
Findings
Test program validity increased from ~48% to over 97%.
ReFuzzer improved code coverage in key compiler components.
Average processing time per test was under 3.5 seconds.
Abstract
Existing LLM-based compiler fuzzers often produce syntactically or semantically invalid test programs, limiting their effectiveness in exercising compiler optimizations and backend components. We introduce ReFuzzer, a framework for refining LLM-generated test programs by systematically detecting and correcting compilation and runtime violations (e.g. division by zero or array out-of-bounds accesses). ReFuzzer employs a feedback loop with a local LLM to validate and filter erroneous programs before execution, improving fuzzing effectiveness beyond crash detection and enabling the generation of diverse yet valid test programs. We evaluated ReFuzzer's effectiveness across black-, grey- and white-box fuzzing approaches targeting LLVM/Clang. ReFuzzer improved test programs' validity from 47.0-49.4% to 96.6-97.3%, with an average processing time of 2.9-3.5 s per test program on a dual-GPU…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
