Hierarchical Embedding Fusion for Retrieval-Augmented Code Generation
Nikita Sorokin, Ivan Sedykh, Valentin Malykh

TL;DR
This paper introduces Hierarchical Embedding Fusion (HEF), a novel two-stage repository representation method that enables low-latency, repository-aware code generation with accuracy comparable to traditional retrieval methods.
Contribution
The paper proposes HEF, a hierarchical dense caching approach that reduces retrieval latency and noise, improving code generation efficiency and accuracy.
Findings
HEF achieves exact-match accuracy comparable to snippet-based retrieval.
HEF operates at sub-second latency on a single GPU.
HEF reduces median end-to-end latency by 13 to 26 times.
Abstract
Retrieval-augmented code generation often conditions the decoder on large retrieved code snippets. This ties online inference cost to repository size and introduces noise from long contexts. We present Hierarchical Embedding Fusion (HEF), a two-stage approach to repository representation for code completion. First, an offline cache compresses repository chunks into a reusable hierarchy of dense vectors using a small fuser model. Second, an online interface maps a small number of retrieved vectors into learned pseudo-tokens that are consumed by the code generator. This replaces thousands of retrieved tokens with a fixed pseudo-token budget while preserving access to repository-level information. On RepoBench and RepoEval, HEF with a 1.8B-parameter pipeline achieves exact-match accuracy comparable to snippet-based retrieval baselines, while operating at sub-second median latency on a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Web Data Mining and Analysis
