Evaluating Language Models for Efficient Code Generation

Jiawei Liu; Songrun Xie; Junhao Wang; Yuxiang Wei; Yifeng; Ding; Lingming Zhang

arXiv:2408.06450·cs.SE·August 14, 2024·2 cites

Evaluating Language Models for Efficient Code Generation

Jiawei Liu, Songrun Xie, Junhao Wang, Yuxiang Wei, Yifeng, Ding, Lingming Zhang

PDF

Open Access

TL;DR

This paper introduces Differential Performance Evaluation (DPE), a new framework for reliably assessing the efficiency of Large Language Models in code generation, addressing limitations of existing benchmarks.

Contribution

The paper presents DPE, a novel evaluation framework that uses efficiency-demanding tasks and compound metrics to better measure LLM code efficiency, along with the EvalPerf benchmark.

Findings

01

Model size has limited impact on code efficiency.

02

Instruction tuning improves both correctness and efficiency.

03

EvalPerf is a reliable and platform-independent benchmark.

Abstract

We introduce Differential Performance Evaluation (DPE), a framework designed to reliably evaluate Large Language Models (LLMs) for efficient code generation. Traditional coding benchmarks often fail to provide reliable insights into code efficiency, due to their reliance on simplistic test inputs and the absence of effective compound metrics. DPE addresses these issues by focusing on efficiency-demanding programming tasks and establishing an insightful compound metric for performance evaluation. DPE operates in two phases: To curate efficiency datasets, it selects efficiency-demanding tasks from existing coding benchmarks and generates computationally expensive inputs to stress the efficiency of LLM solutions. To assess the code efficiency, DPE profiles the new solution and compares it globally against a set of reference solutions that exhibit distinct efficiency levels, where the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel-Driven Software Engineering Techniques · Natural Language Processing Techniques · Software Testing and Debugging Techniques

MethodsSparse Evolutionary Training