Automated Repair of AI Code with Large Language Models and Formal Verification
Yiannis Charalambous, Edoardo Manino, Lucas C. Cordeiro

TL;DR
This paper presents an automated approach combining large language models and formal verification to detect and repair memory safety vulnerabilities in neural network code, enhancing AI system safety.
Contribution
It introduces a large-scale neural network code dataset, integrates formal verification with language models for automated repair, and evaluates different prompt engineering techniques.
Findings
Expanded NeuroCodeBench to 81k programs for testing.
Successfully verified memory safety and repaired vulnerabilities automatically.
Compared effectiveness of various prompt engineering methods.
Abstract
The next generation of AI systems requires strong safety guarantees. This report looks at the software implementation of neural networks and related memory safety properties, including NULL pointer deference, out-of-bound access, double-free, and memory leaks. Our goal is to detect these vulnerabilities, and automatically repair them with the help of large language models. To this end, we first expand the size of NeuroCodeBench, an existing dataset of neural network code, to about 81k programs via an automated process of program mutation. Then, we verify the memory safety of the mutated neural network implementations with ESBMC, a state-of-the-art software verifier. Whenever ESBMC spots a vulnerability, we invoke a large language model to repair the source code. For the latest task, we compare the performance of various state-of-the-art prompt engineering techniques, and an iterative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Software Engineering Research · Machine Learning and Data Classification
