QLCoder: A Query Synthesizer For Static Analysis of Security Vulnerabilities

Claire Wang; Ziyang Li; Saikat Dutta; Mayur Naik

arXiv:2511.08462·cs.CR·March 26, 2026

QLCoder: A Query Synthesizer For Static Analysis of Security Vulnerabilities

Claire Wang, Ziyang Li, Saikat Dutta, Mayur Naik

PDF

Open Access 3 Reviews

TL;DR

QLCoder automatically generates security vulnerability detection queries in CodeQL from CVE metadata using an LLM-based synthesis loop with structured feedback, improving accuracy over baseline methods.

Contribution

Introduces QLCoder, a novel framework that synthesizes CodeQL queries from CVE data using an LLM with structured feedback, enhancing static security analysis capabilities.

Findings

01

Correctly detects vulnerabilities in 53.4% of CVEs

02

Outperforms baseline Claude Code synthesis (10%)

03

Works across 111 Java projects

Abstract

Static analysis tools provide a powerful means to detect security vulnerabilities by specifying queries that encode vulnerable code patterns. However, writing such queries is challenging and requires diverse expertise in security and program analysis. To address this challenge, we present QLCoder - an agentic framework that automatically synthesizes queries in CodeQL, a powerful static analysis engine, directly from a given CVE metadata. QLCode embeds an LLM in a synthesis loop with execution feedback, while constraining its reasoning using a custom MCP interface that allows structured interaction with a Language Server Protocol (for syntax guidance) and a RAG database (for semantic retrieval of queries and documentation). This approach allows QLCoder to generate syntactically and semantically valid security queries. We evaluate QLCode on 176 existing CVEs across 111 Java projects.…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 6Confidence 3

Strengths

1. Tackles a practical and timely challenge in security automation. 2. Clear design: LSP for syntax, RAG for semantics, feedback loop for correctness. 3. Strong results across real-world projects; large margin over baselines. 4. Good ablations showing importance of LSP + retrieval.

Weaknesses

1. Only Java CVEs are supported; unclear how easily this extends to other CodeQL languages (C/C++, JS, Python). Authors mention RAG replacement, but actual cross-language results are not shown. 2. The system leans on Claude’s coding agent + CodeQL LSP. It is unclear how reproducible this is without Anthropic tooling or whether open-model support is viable (Gemini/GPT baselines fail). 3. Iterative refinement with code execution can be slow. Wall-clock time, computational budget, and iteration cos

Reviewer 02Rating 8Confidence 4

Strengths

+ The application of script generation for vulnerability detection is a novel application, and with the help of RAC, the approach can mitigate the issues with low-resource static analysis query languages. + The approach shows great efficiency, outperforming vanilla LLMs and original CodeQL analysis by a big margin.

Weaknesses

- The evaluation shows that FineNib can successfully generate good queries. However, the evaluation of a single LLM makes it somewhat incomplete. Claude Sonnet 4 is a very powerful, expensive, closed-source LLM; knowing the effectiveness of FineNib with a weaker open-source LLM would make the evaluation more complete. - Since agentic approaches tend to be more costly due to their iterative refinement process, a cost vs effectiveness analysis could inform the reader of the cost per performance ga

Reviewer 03Rating 2Confidence 4

Strengths

I think there are some interesting ideas in this paper when it comes to neurosymbolic program analysis. I think that using execution feedback to guide the LLM to synthesize a query is a very good idea. There are many parts of program analysis that are tedious and difficult (writing CodeQL queries is a great example of this!). By incorporating an LLM into the loop and providing some feedback, I think we can make program analyses much more accessible and easier to write. I also liked the discussio

Weaknesses

Unfortunately, where this paper falls apart for me is in the motivation. I don't understand why we need a query for one *specific* CVE, especially given the fact that that CVE has been patched. To my understanding, IRIS and other tools write queries that detect a general weakness pattern, like code injection (CWE-94). These queries can be used to detect that weakness pattern in other projects. But a CVE is just *one* instance of a CWE: "CVE-2024-12345: Buffer overflow in `foo()` of `libbar` 2.3.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSecurity and Verification in Computing · Web Application Security Vulnerabilities · Software Testing and Debugging Techniques