Enabling Memory Safety of C Programs using LLMs
Nausheen Mohammed, Akash Lal, Aseem Rastogi, Subhajit Roy and, Rahul Sharma

TL;DR
This paper introduces MSA, a tool leveraging Large Language Models to automate porting C code to a safe dialect, significantly reducing manual effort and improving memory safety in real-world and benchmark code.
Contribution
It presents a novel framework combining LLMs with static analysis for whole-program transformations to enhance memory safety in C programs.
Findings
MSA outperforms baseline LLM approaches in code porting tasks.
MSA achieves better results than existing symbolic techniques.
Effective handling of large codebases up to 20K lines.
Abstract
Memory safety violations in low-level code, written in languages like C, continues to remain one of the major sources of software vulnerabilities. One method of removing such violations by construction is to port C code to a safe C dialect. Such dialects rely on programmer-supplied annotations to guarantee safety with minimal runtime overhead. This porting, however, is a manual process that imposes significant burden on the programmer and, hence, there has been limited adoption of this technique. The task of porting not only requires inferring annotations, but may also need refactoring/rewriting of the code to make it amenable to such annotations. In this paper, we use Large Language Models (LLMs) towards addressing both these concerns. We show how to harness LLM capabilities to do complex code reasoning as well as rewriting of large codebases. We also present a novel framework for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Software Reliability and Analysis Research · Software Testing and Debugging Techniques
