When to Stop? Towards Efficient Code Generation in LLMs with Excess   Token Prevention

Lianghong Guo; Yanlin Wang; Ensheng Shi; Wanjun Zhong; Hongyu Zhang,; Jiachi Chen; Ruikai Zhang; Yuchi Ma; Zibin Zheng

arXiv:2407.20042·cs.SE·July 30, 2024

When to Stop? Towards Efficient Code Generation in LLMs with Excess Token Prevention

Lianghong Guo, Yanlin Wang, Ensheng Shi, Wanjun Zhong, Hongyu Zhang,, Jiachi Chen, Ruikai Zhang, Yuchi Ma, Zibin Zheng

PDF

Open Access 1 Repo

TL;DR

This paper introduces CodeFast, a method to improve code generation efficiency in large language models by detecting and stopping unnecessary token generation, significantly speeding up inference without losing quality.

Contribution

The paper proposes a novel inference acceleration approach, CodeFast, with a trained model GenGuard to predict when to terminate code generation, reducing computational waste.

Findings

01

Speedup of 34% to 452% in inference time across models

02

Maintains code quality despite acceleration

03

Effective across multiple programming languages and datasets

Abstract

Code generation aims to automatically generate code snippets that meet given natural language requirements and plays an important role in software development. Although Code LLMs have shown excellent performance in this domain, their long generation time poses a signification limitation in practice use. In this paper, we first conduct an in-depth preliminary study with different Code LLMs on code generation tasks and identify a significant efficiency issue, i.e., continual generation of excess tokens. It harms the developer productivity and leads to huge computational wastes. To address it, we introduce CodeFast, an inference acceleration approach for Code LLMs on code generation. The key idea of CodeFast is to terminate the inference process in time when unnecessary excess tokens are detected. First, we propose an automatic data construction framework to obtain training data. Then, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

deepsoftwareanalytics/codefast
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Storage Technologies · Parallel Computing and Optimization Techniques · Advancements in Semiconductor Devices and Circuit Design