ECCO: Can We Improve Model-Generated Code Efficiency Without Sacrificing   Functional Correctness?

Siddhant Waghjale; Vishruth Veerendranath; Zora Zhiruo Wang; Daniel; Fried

arXiv:2407.14044·cs.CL·October 11, 2024

ECCO: Can We Improve Model-Generated Code Efficiency Without Sacrificing Functional Correctness?

Siddhant Waghjale, Vishruth Veerendranath, Zora Zhiruo Wang, Daniel, Fried

PDF

Open Access 2 Repos 1 Datasets 1 Video

TL;DR

ECCO introduces a reproducible benchmark for evaluating and improving the efficiency of code generated by large language models, balancing efficiency gains with functional correctness across different approaches.

Contribution

The paper presents ECCO, a new benchmark for assessing code efficiency in LLMs, and investigates three approaches to improve efficiency without sacrificing correctness.

Findings

01

Execution information helps maintain correctness.

02

NL feedback improves efficiency.

03

Most methods slightly increase efficiency while affecting correctness.

Abstract

Although large language models (LLMs) have been largely successful in generating functionally correct programs, conditioning models to produce efficient solutions while ensuring correctness remains a challenge. Further, unreliability in benchmarking code efficiency is a hurdle across varying hardware specifications for popular interpreted languages such as Python. In this paper, we present ECCO, a reproducible benchmark for evaluating program efficiency via two paradigms: natural language (NL) based code generation and history-based code editing. On ECCO, we adapt and thoroughly investigate the three most promising existing LLM-based approaches: in-context learning, iterative refinement with execution or NL feedback, and fine-tuning conditioned on execution and editing history. While most methods degrade functional correctness and moderately increase program efficiency, we find that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Datasets

CodeEff/ECCO
dataset· 171 dl
171 dl

Videos

ECCO: Can We Improve Model-Generated Code Efficiency Without Sacrificing Functional Correctness?· underline

Taxonomy

TopicsNatural Language Processing Techniques