Let the Code LLM Edit Itself When You Edit the Code

Zhenyu He; Jun Zhang; Shengjie Luo; Jingjing Xu; Zhi Zhang; Di He

arXiv:2407.03157·cs.CL·March 5, 2025

Let the Code LLM Edit Itself When You Edit the Code

Zhenyu He, Jun Zhang, Shengjie Luo, Jingjing Xu, Zhi Zhang, Di He

PDF

Open Access 3 Reviews

TL;DR

This paper introduces PIE, a novel encoding method that significantly reduces computational costs in code editing scenarios for large language models, while maintaining high prediction accuracy.

Contribution

PIE modifies rotary positional encoding to eliminate temporal confusion, enabling efficient and accurate code editing with minimal recomputation.

Findings

01

PIE reduces computational overhead by over 85%.

02

PIE maintains model performance close to full recomputation.

03

Effective across multiple model sizes and coding tasks.

Abstract

In this work, we investigate a typical scenario in code generation where a developer edits existing code in real time and requests a code assistant, e.g., a large language model, to re-predict the next token or next line on the fly. Naively, the LLM needs to re-encode the entire KV cache to provide an accurate prediction. However, this process is computationally expensive, especially when the sequence length is long. Simply encoding the edited subsequence and integrating it to the original KV cache meets the temporal confusion problem, leading to significantly worse performance. We address this efficiency and accuracy trade-off by introducing \underline{\textbf{Positional \textbf{I}ntegrity \textbf{E}ncoding} (PIE). Building upon the rotary positional encoding, PIE first removes the rotary matrices in the Key cache that introduce temporal confusion and then reapplies the correct rotary…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 3Confidence 4

Strengths

The paper effectively outlines the real-time editing problem and clearly describes the mathematical foundation for PIE based on rotary positional encoding.

Weaknesses

**Limited Technical Novelty**: The mathematical derivation is relatively straightforward, stemming directly from rotary positional encoding's relative nature, without additional innovation or complexity. **Unrealistic Setting for Interactive Editing**: * Random Edits Only: The experimental setup evaluates PIE on random edits, which does not align with realistic real-time editing workflows, where temporally or contextually related edits are more common (e.g., editing a function signature and th

Reviewer 02Rating 6Confidence 3

Strengths

1. The paper is well-written and easy to understand. 2. The authors solve an important task of efficiency in updating KV cache in a real-time code editing setting. This is crucial for interactive coding assistant scenario where the developers make frequent and incremental changes to the exisiting code and require copilot to correctly predict the next line on the fly. 3. The authors perform experiments on 1 dataset for 3 tasks and show 85% reduction in computational overhead compared to brute-for

Weaknesses

1. The results are limited to 1 dataset and 1 model. Including more than 1 dataset and model would make the claim more strong. 2. The authors solve an important task of efficiency of real-time code editing but do not discuss the limitations of this approach for other tasks where semantic impact is large or in case of large code edits. 3.The approach has a dependency on RoPE and might not be suitable for other models without RoPE

Reviewer 03Rating 8Confidence 4

Strengths

I really enjoy this paper! The Positional Integrity Encoding (PIE) introduced by the authors capitalizes on RoPE, adeptly addressing temporal disorientation by initially stripping away the rotary matrices responsible for confusion and subsequently reinstating the appropriate matrices through straightforward matrix multiplication. This capability to enhance computational efficiency without compromising accuracy is precisely the straightforward yet potent approach we value in the realm of langua

Weaknesses

My current concerns are regarding the selection of downstream tasks and evaluation metrics considered by the authors. (1) The tasks of code insertion, code deletion, and multi-place code editing that the authors have considered seem less critical and common in actual development scenarios compared to code generation. (2) The chosen evaluation metrics, EM (Exact Match) and ES (Edit Similarity), may not accurately assess the semantic correctness of the generated code. (3) The selection of model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Rights Management and Security