TL;DR
RepoDoc is a knowledge graph-based framework that automates comprehensive, modular documentation generation and efficient incremental updates for large codebases, significantly improving coverage, speed, and accuracy.
Contribution
It introduces a novel semantic knowledge graph approach for documentation that enables modular, cross-referenced outputs and targeted incremental updates, outperforming existing methods.
Findings
API coverage increased by 32.5%
Documentation generation is 3x faster with 85% fewer tokens
Incremental update time reduced by 73% with higher accuracy
Abstract
Maintaining up-to-date, comprehensive documentation for large codebases is a persistent challenge. Recent progress in automated documentation has moved from template-based rules to large language models (LLMs), yet existing tools still process source code as flat fragments, producing isolated documents that lack semantic structure. This design also leads to excessive token consumption and slow generation, while failing to capture how code changes propagate across dependencies. We propose RepoDoc, a system that uses a repository knowledge graph (RepoKG) as the semantic foundation for the entire documentation lifecycle. Our framework consists of three stages: (1) RepoKG construction, which extracts code entities and their relationships; (2) module clustering, which groups code into functionally cohesive, hierarchical units; and (3) skillful agent-based generation, which queries the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
