CoRoVA: Compressed Representations for Vector-Augmented Code Completion
Daria Cherniuk, Nikita Sukhorukov, Danil Gusak, Nikita Sushko, Danil Sivtsov, Elena Tutubalina, Evgeny Frolov

TL;DR
CoRoVA introduces a method to compress context in retrieval-augmented code generation, significantly reducing latency and improving prediction quality with minimal additional computation.
Contribution
It presents a novel compression framework that maintains semantic richness and interpretability, enhancing code completion efficiency and effectiveness.
Findings
Achieves 20-38% reduction in time-to-first-token compared to uncompressed retrieval-augmented generation.
Requires only training a small projector module, adding negligible latency.
Improves generation quality while keeping prompt size minimal.
Abstract
Retrieval-augmented generation has emerged as one of the most effective approaches for code completion enhancement, especially when repository-level context is important. However, adding this extra retrieved context significantly increases sequence length, raises prefill cost, and degrades time-to-first-token (TTFT), which slows down inference -- a critical limitation for interactive settings such as IDEs. In this work, we introduce CoRoVA, a framework that compresses context into compact, semantically rich representations that remain interpretable to code LLMs. This improves generation quality while reducing prompt augmentation to only a few compressed single-token vectors. Our approach requires training only a small projector module and introduces negligible additional latency, yet it significantly improves the prediction quality of code LLMs. Our experiments show that CoRoVA enables…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
