FastCoder: Accelerating Repository-level Code Generation via Efficient Retrieval and Verification
Qianhui Zhao, Li Zhang, Fang Liu, Xiaoli Lian, Qiaoyuanhe Meng, Ziqian Jiao, Zetong Zhou, Jia Li, Lin Shi

TL;DR
FastCoder is a specialized inference acceleration method for repository-level code generation that leverages retrieval and caching to significantly improve speed without sacrificing code quality.
Contribution
FastCoder introduces a retrieval-based acceleration approach tailored for code generation, combining multi-source datastores, controlled retrieval timing, and cache strategies.
Findings
Achieves up to 2.54x speedup in code generation tasks.
Outperforms state-of-the-art inference acceleration methods by up to 88%.
Can be integrated with existing approaches for over 2.6x speedup.
Abstract
Code generation is a latency-sensitive task that demands high timeliness. However, with the growing interest and inherent difficulty in repository-level code generation, most existing code generation studies focus on improving the correctness of generated code while overlooking the inference efficiency, which is substantially affected by the overhead during LLM generation. Although there has been work on accelerating LLM inference, these approaches are not tailored to the specific characteristics of code generation; instead, they treat code the same as natural language sequences and ignore its unique syntax and semantic characteristics, which are also crucial for improving efficiency. Consequently, these approaches exhibit limited effectiveness in code generation tasks, particularly for repository-level scenarios with considerable complexity and difficulty. To alleviate this issue,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsALIGN · Focus
