DiffSpec: Differential Testing with LLMs using Natural Language Specifications and Code Artifacts
Nikitha Rao, Elizabeth Gilbert, Harrison Green, Tahina Ramananandro,, Nikhil Swamy, Claire Le Goues, Sarah Fakhoury

TL;DR
DiffSpec leverages large language models to generate targeted differential tests from natural language specifications and code artifacts, effectively uncovering multiple bugs in complex software systems.
Contribution
The paper introduces DiffSpec, a novel framework that uses prompt chaining with LLMs to generate differential tests from natural language specs and code artifacts, revealing previously unknown bugs.
Findings
Generated 1901 tests for eBPF, uncovering four bugs.
Produced 299 tests for Wasm validators, identifying two bugs.
Demonstrated effectiveness on real-world systems like eBPF and Wasm validators.
Abstract
Differential testing can be an effective way to find bugs in software systems with multiple implementations that conform to the same specification, like compilers, network protocol parsers, or language runtimes. Specifications for such systems are often standardized in natural language documents, like Instruction Set Architecture (ISA) specifications or IETF RFC's. Large Language Models (LLMs) have demonstrated potential in both generating tests and handling large volumes of natural language text, making them well-suited for analyzing artifacts like specification documents, bug reports, and code implementations. In this work, we leverage natural language and code artifacts to guide LLMs to generate targeted tests that highlight meaningful behavioral differences between implementations, including those corresponding to bugs. We introduce DiffSpec, a framework for generating differential…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
