Bridging Code Property Graphs and Language Models for Program Analysis
Ahmed Lekssays

TL;DR
This paper presents codebadger, a tool that integrates Code Property Graphs with Large Language Models to enable scalable, semantic program analysis across large codebases for vulnerability detection and patching.
Contribution
It introduces codebadger, a novel server that combines CPGs with LLMs, allowing targeted exploration and analysis of large codebases without exhaustive file reading.
Findings
Successfully navigated an 8,000 method codebase for memory safety
Discovered and exploited a new buffer overflow in libtiff
Generated a correct patch for a CVE-2025-6021 vulnerability on first attempt
Abstract
Large Language Models (LLMs) face critical challenges when analyzing security vulnerabilities in real world codebases: token limits prevent loading entire repositories, code embeddings fail to capture inter procedural data flows, and LLMs struggle to generate complex static analysis queries. These limitations force existing approaches to operate on isolated code snippets, missing vulnerabilities that span multiple functions and files. We introduce codebadger, an open source Model Context Protocol (MCP) server that integrates Joern's Code Property Graph (CPG) engine with LLMs. Rather than requiring LLMs to generate complex CPG queries, codebadger provides high level tools for program slicing, taint tracking, data flow analysis, and semantic code navigation, enabling targeted exploration of large codebases without exhaustive file reading. We demonstrate its effectiveness through three use…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Security and Verification in Computing
