Reliable Graph-RAG for Codebases: AST-Derived Graphs vs LLM-Extracted Knowledge Graphs
Manideep Reddy Chinthareddy

TL;DR
This paper compares AST-derived deterministic knowledge graphs with LLM-extracted graphs for codebase retrieval, showing that AST-based graphs offer more reliable coverage, lower costs, and better multi-hop reasoning in software engineering tasks.
Contribution
It introduces a benchmark for evaluating retrieval pipelines using AST-derived and LLM-generated knowledge graphs on Java codebases, highlighting the advantages of deterministic graphs.
Findings
AST-derived graphs build quickly and cover more code accurately.
Deterministic graphs outperform LLM-generated graphs in correctness and coverage.
AST-based approach has lower indexing cost and better multi-hop reasoning.
Abstract
Retrieval-Augmented Generation for software engineering often relies on vector similarity search, which captures topical similarity but can fail on multi-hop architectural reasoning such as controller to service to repository chains, interface-driven wiring, and inheritance. This paper benchmarks three retrieval pipelines on Java codebases (Shopizer, with additional runs on ThingsBoard and OpenMRS Core): (A) vector-only No-Graph RAG, (B) an LLM-generated knowledge graph RAG (LLM-KB), and (C) a deterministic AST-derived knowledge graph RAG (DKB) built with Tree-sitter and bidirectional traversal. Using 15 architecture and code-tracing queries per repository, we measure indexing time, query latency, corpus coverage, cost, and answer correctness. DKB builds its graph in seconds, while LLM-KB requires much longer graph generation. LLM-KB also shows indexing incompleteness: on Shopizer,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Software Engineering Research · Graph Theory and Algorithms
