Codebase-Memory: Tree-Sitter-Based Knowledge Graphs for LLM Code Exploration via MCP

Martin Vogel; Falk Meyer-Eschenbach; Severin Kohler; Elias Gr\"unewald; Felix Balzer

arXiv:2603.27277·cs.SE·March 31, 2026

Codebase-Memory: Tree-Sitter-Based Knowledge Graphs for LLM Code Exploration via MCP

Martin Vogel, Falk Meyer-Eschenbach, Severin Kohler, Elias Gr\"unewald, Felix Balzer

PDF

TL;DR

Codebase-Memory is an open-source system that builds a persistent, language-agnostic knowledge graph from codebases using Tree-Sitter, improving LLM code exploration efficiency and accuracy.

Contribution

It introduces a novel Tree-Sitter-based knowledge graph construction method via MCP, enabling efficient, multi-language code understanding for LLM agents.

Findings

01

Achieves 83% answer quality with fewer tokens and tool calls.

02

Matches or exceeds explorer on hub detection and caller ranking in most languages.

03

Supports 66 languages through a multi-phase, parallel pipeline.

Abstract

Large Language Model (LLM) coding agents typically explore codebases through repeated file-reading and grep-searching, consuming thousands of tokens per query without structural understanding. We present Codebase-Memory, an open-source system that constructs a persistent, Tree-Sitter-based knowledge graph via the Model Context Protocol (MCP), parsing 66 languages through a multi-phase pipeline with parallel worker pools, call-graph traversal, impact analysis, and community discovery. Evaluated across 31 real-world repositories, Codebase-Memory achieves 83% answer quality versus 92% for a file-exploration agent, at ten times fewer tokens and 2.1 times fewer tool calls. For graph-native queries such as hub detection and caller ranking, it matches or exceeds the explorer on 19 of 31 languages.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.