Can It Edit? Evaluating the Ability of Large Language Models to Follow Code Editing Instructions
Federico Cassano, Luisa Li, Akul Sethi, Noah Shinn, Abby, Brennan-Jones, Jacob Ginesin, Edward Berman, George Chakhnashvili, Anton, Lozhkov, Carolyn Jane Anderson, Arjun Guha

TL;DR
This paper evaluates large language models' ability to perform code editing tasks based on natural language instructions, introduces a benchmark and dataset, and demonstrates that fine-tuning open models can improve their editing performance.
Contribution
It presents a new benchmark and dataset for code editing tasks, and shows that fine-tuning open LLMs enhances their code editing capabilities, narrowing the gap with closed models.
Findings
GPT-3.5-Turbo outperforms open models in code editing.
Fine-tuning open models significantly improves their editing performance.
A new benchmark and dataset for code editing tasks are introduced.
Abstract
A significant amount of research is focused on developing and evaluating large language models for a variety of code synthesis tasks. These include synthesizing code from natural language, synthesizing tests from code, and synthesizing explanations of code. In contrast, the behavior of instructional code editing with LLMs is understudied. These are tasks in which the model is provided a block of code and an instruction to modify the code. The editing instruction may ask for a feature to be added or removed, describe a bug and ask for a fix, or ask for a different kind of solution. We introduce a carefully crafted benchmark of code editing tasks and use it to evaluate several cutting edge LLMs. Our evaluation exposes a significant gap between the capabilities of state-of-the-art open and closed models. For example, even GPT-3.5-Turbo is better than the best open model at code editing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Accessibility for Disabilities · Software Engineering Research · Open Education and E-Learning
MethodsAttention Is All You Need · Linear Layer · Attention Dropout · 15 Ways to Contact How can i speak to someone at Delta Airlines · Residual Connection · Weight Decay · {Dispute@FaQ-s}How to file a dispute with Expedia? · Dropout · Layer Normalization · Byte Pair Encoding
