HCAG: Hierarchical Abstraction and Retrieval-Augmented Generation on Theoretical Repositories with LLMs
Yusen Wu, Xiaotie Deng

TL;DR
HCAG introduces a hierarchical, planning-based framework for code generation from theoretical repositories, significantly improving architectural coherence and code quality in complex, theory-driven codebases.
Contribution
It presents a novel hierarchical abstraction and retrieval-augmented generation approach that links theory, architecture, and implementation for complex codebases.
Findings
Outperforms baseline methods in code quality and architectural coherence.
Creates a large, aligned theory-implementation dataset for LLM domain adaptation.
Achieves cost-optimal hierarchical abstraction with adaptive node compression.
Abstract
Existing Retrieval-Augmented Generation (RAG) methods for code struggle to capture the high-level architectural patterns and cross-file dependencies inherent in complex, theory-driven codebases, such as those in algorithmic game theory (AGT), leading to a persistent semantic and structural gap between abstract concepts and executable implementations. To address this challenge, we propose Hierarchical Code/Architecture-guided Agent Generation (HCAG), a framework that reformulates repository-level code generation as a structured, planning-oriented process over hierarchical knowledge. HCAG adopts a two-phase design: an offline hierarchical abstraction phase that recursively parses code repositories and aligned theoretical texts to construct a multi-resolution semantic knowledge base explicitly linking theory, architecture, and implementation; and an online hierarchical retrieval and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Scientific Computing and Data Management · Model-Driven Software Engineering Techniques
