LLMDFA: Analyzing Dataflow in Code with Large Language Models
Chengpeng Wang, Wuqi Zhang, Zian Su, Xiangzhe Xu, Xiaoheng Xie,, Xiangyu Zhang

TL;DR
LLMDFA introduces a novel, compilation-free dataflow analysis framework powered by large language models, enabling customizable and reliable analysis of code, including uncompilable programs, by decomposing tasks and leveraging external tools.
Contribution
This work presents LLMDFA, a new LLM-based framework that performs dataflow analysis without compilation, addressing hallucinations and improving applicability for real-world, evolving analysis needs.
Findings
Achieves 87.10% precision and 80.77% recall in bug detection
Surpasses existing techniques with up to 0.35 F1 score improvement
Effective on both synthetic and real-world Android programs
Abstract
Dataflow analysis is a fundamental code analysis technique that identifies dependencies between program values. Traditional approaches typically necessitate successful compilation and expert customization, hindering their applicability and usability for analyzing uncompilable programs with evolving analysis needs in real-world scenarios. This paper presents LLMDFA, an LLM-powered compilation-free and customizable dataflow analysis framework. To address hallucinations for reliable results, we decompose the problem into several subtasks and introduce a series of novel strategies. Specifically, we leverage LLMs to synthesize code that outsources delicate reasoning to external expert tools, such as using a parsing library to extract program values of interest and invoking an automated theorem prover to validate path feasibility. Additionally, we adopt a few-shot chain-of-thought prompting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsBusiness Process Modeling and Analysis · Semantic Web and Ontologies · Scientific Computing and Data Management
MethodsLib
