AdaptEval: A Benchmark for Evaluating Large Language Models on Code Snippet Adaptation

Tanghaoran Zhang; Xinjun Mao; Shangwen Wang; Yuxin Zhao; Yao Lu; Jin Zhang; Zhang Zhang; Kang Yang; Yue Yu

arXiv:2601.04540·cs.SE·January 9, 2026

AdaptEval: A Benchmark for Evaluating Large Language Models on Code Snippet Adaptation

Tanghaoran Zhang, Xinjun Mao, Shangwen Wang, Yuxin Zhao, Yao Lu, Jin Zhang, Zhang Zhang, Kang Yang, Yue Yu

PDF

Open Access

TL;DR

AdaptEval is a new benchmark designed to evaluate large language models' ability to adapt code snippets, incorporating real-world context, multi-level annotations, and detailed testing to assess their practical adaptation skills.

Contribution

This paper introduces AdaptEval, the first benchmark specifically targeting LLMs' code snippet adaptation, with features supporting diverse, context-rich, and fine-grained evaluation.

Findings

01

LLMs show limited ability to follow explicit adaptation instructions

02

AdaptEval effectively assesses LLMs' adaptation performance

03

Empirical results highlight current limitations in reasoning LLMs

Abstract

Recent advancements in large language models (LLMs) have automated various software engineering tasks, with benchmarks emerging to evaluate their capabilities. However, for adaptation, a critical activity during code reuse, there is no benchmark to assess LLMs' performance, leaving their practical utility in this area unclear. To fill this gap, we propose AdaptEval, a benchmark designed to evaluate LLMs on code snippet adaptation. Unlike existing benchmarks, AdaptEval incorporates the following three distinctive features: First, Practical Context. Tasks in AdaptEval are derived from developers' practices, preserving rich contextual information from Stack Overflow and GitHub communities. Second, Multi-granularity Annotation. Each task is annotated with requirements at both task and adaptation levels, supporting the evaluation of LLMs across diverse adaptation scenarios. Third,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Software System Performance and Reliability · Scientific Computing and Data Management