Generating Complex Code Analyzers from Natural Language Questions
Amirmohammad Nazari, Sadra Sabouri, Wang Bill Zhu, Robin Jia, Souti Chattopadhyay, Mukund Raghothaman

TL;DR
Merlin is a system that combines large language models with CodeQL to answer complex, free-form questions about large codebases, improving bug detection and developer efficiency.
Contribution
The paper introduces Merlin, a novel system integrating LLMs with CodeQL using RAG-based iterative query generation and self-testing to enhance code analysis capabilities.
Findings
Merlin discovered most software issues reported by other methods.
Merlin identified additional issues that remained undetected by previous approaches.
User studies showed Merlin increased task accuracy and reduced completion time.
Abstract
Many software development tasks, such as implementing features and fixing bugs, begin with developers posing questions about a codebase. However, answering questions about codebases that span millions of lines of code across thousands of files is non-trivial. Standard tools like grep cannot answer questions requiring semantic or inter-procedural reasoning, and large language models (LLMs) struggle with large codebases due to resource and context constraints. In this paper, we present Merlin, a new system for answering free-form questions that require analytical reasoning about code. Merlin integrates an LLM with CodeQL, a program analysis framework that supports expressive queries over large codebases. We face two principal challenges in the design of such systems: First, program analysis queries are diverse and semantically complex; as a result, even syntactically well-formed queries…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
