Bridging Code Property Graphs and Language Models for Program Analysis

Ahmed Lekssays

arXiv:2603.24837·cs.CR·March 27, 2026

Bridging Code Property Graphs and Language Models for Program Analysis

Ahmed Lekssays

PDF

Open Access

TL;DR

This paper presents codebadger, a tool that integrates Code Property Graphs with Large Language Models to enable scalable, semantic program analysis across large codebases for vulnerability detection and patching.

Contribution

It introduces codebadger, a novel server that combines CPGs with LLMs, allowing targeted exploration and analysis of large codebases without exhaustive file reading.

Findings

01

Successfully navigated an 8,000 method codebase for memory safety

02

Discovered and exploited a new buffer overflow in libtiff

03

Generated a correct patch for a CVE-2025-6021 vulnerability on first attempt

Abstract

Large Language Models (LLMs) face critical challenges when analyzing security vulnerabilities in real world codebases: token limits prevent loading entire repositories, code embeddings fail to capture inter procedural data flows, and LLMs struggle to generate complex static analysis queries. These limitations force existing approaches to operate on isolated code snippets, missing vulnerabilities that span multiple functions and files. We introduce codebadger, an open source Model Context Protocol (MCP) server that integrates Joern's Code Property Graph (CPG) engine with LLMs. Rather than requiring LLMs to generate complex CPG queries, codebadger provides high level tools for program slicing, taint tracking, data flow analysis, and semantic code navigation, enabling targeted exploration of large codebases without exhaustive file reading. We demonstrate its effectiveness through three use…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Security and Verification in Computing