When to Stop? Towards Efficient Code Generation in LLMs with Excess Token Prevention
Lianghong Guo, Yanlin Wang, Ensheng Shi, Wanjun Zhong, Hongyu Zhang,, Jiachi Chen, Ruikai Zhang, Yuchi Ma, Zibin Zheng

TL;DR
This paper introduces CodeFast, a method to improve code generation efficiency in large language models by detecting and stopping unnecessary token generation, significantly speeding up inference without losing quality.
Contribution
The paper proposes a novel inference acceleration approach, CodeFast, with a trained model GenGuard to predict when to terminate code generation, reducing computational waste.
Findings
Speedup of 34% to 452% in inference time across models
Maintains code quality despite acceleration
Effective across multiple programming languages and datasets
Abstract
Code generation aims to automatically generate code snippets that meet given natural language requirements and plays an important role in software development. Although Code LLMs have shown excellent performance in this domain, their long generation time poses a signification limitation in practice use. In this paper, we first conduct an in-depth preliminary study with different Code LLMs on code generation tasks and identify a significant efficiency issue, i.e., continual generation of excess tokens. It harms the developer productivity and leads to huge computational wastes. To address it, we introduce CodeFast, an inference acceleration approach for Code LLMs on code generation. The key idea of CodeFast is to terminate the inference process in time when unnecessary excess tokens are detected. First, we propose an automatic data construction framework to obtain training data. Then, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Parallel Computing and Optimization Techniques · Advancements in Semiconductor Devices and Circuit Design
