Bootstrapping Fuzzers for Compilers of Low-Resource Language Dialects Using Language Models
Sairam Vaidya, Marcel B\"ohme, Loris D'Antoni

TL;DR
This paper introduces Germinator, a novel approach that uses language models and grammar-based fuzzing to automatically generate effective test inputs for compiler dialects, improving bug detection and coverage in low-resource dialects.
Contribution
The paper presents Germinator, a tool that combines grammar extraction and language models to generate diverse seed inputs for dialect-agnostic fuzzing of compilers, addressing limitations of prior methods.
Findings
Seeds improve line coverage by 10-120% over baselines.
Discovered 88 previously unknown bugs, including 23 in low-resource dialects.
Effective testing of heterogeneous dialect ecosystems at scale.
Abstract
Modern extensible compiler frameworks-such as MLIR-enable rapid creation of domain-specific language dialects. This flexibility, however, makes correctness harder to ensure as the same extensibility that accelerates development also complicates maintaining the testing infrastructure. Extensible languages require automated test generation that is both dialect-agnostic (works across dialects without manual adaptation) and dialect-effective (targets dialect-specific features to find bugs). Existing approaches typically sacrifice one of these goals by either requiring manually constructed seed corpora for each dialect, or by failing to be effective. We present a dialect-agnostic and dialect-effective grammar-based and coverage-guided fuzzing approach for extensible compilers that combines two key insights from existing work: (i) the grammars of dialects, which already encode the structural…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Testing and Debugging Techniques · Software Engineering Research · Natural Language Processing Techniques
