ELABORATION: A Comprehensive Benchmark on Human-LLM Competitive Programming

Xinwei Yang; Zhaofeng Liu; Chen Huang; Jiashuai Zhang; Tong Zhang; Yifan Zhang; Wenqiang Lei

arXiv:2505.16667·cs.AI·May 23, 2025

ELABORATION: A Comprehensive Benchmark on Human-LLM Competitive Programming

Xinwei Yang, Zhaofeng Liu, Chen Huang, Jiashuai Zhang, Tong Zhang, Yifan Zhang, Wenqiang Lei

PDF

Open Access 1 Repo

TL;DR

This paper introduces ELABORATION, a comprehensive benchmark for evaluating human-LLM collaboration in competitive programming, including a new taxonomy of human feedback, a specialized dataset, and an assessment framework to identify strengths and weaknesses.

Contribution

It presents the first taxonomy of human feedback in programming, a dedicated dataset for human-LLM collaboration, and a benchmark for thorough evaluation of methods.

Findings

01

Identified key strengths and weaknesses of current approaches.

02

Provided a new dataset annotated for human feedback simulation.

03

Established a benchmark for future research in human-LLM collaborative programming.

Abstract

While recent research increasingly emphasizes the value of human-LLM collaboration in competitive programming and proposes numerous empirical methods, a comprehensive understanding remains elusive due to the fragmented nature of existing studies and their use of diverse, application-specific human feedback. Thus, our work serves a three-fold purpose: First, we present the first taxonomy of human feedback consolidating the entire programming process, which promotes fine-grained evaluation. Second, we introduce ELABORATIONSET, a novel programming dataset specifically designed for human-LLM collaboration, meticulously annotated to enable large-scale simulated human feedback and facilitate costeffective real human interaction studies. Third, we introduce ELABORATION, a novel benchmark to facilitate a thorough assessment of human-LLM competitive programming. With ELABORATION, we pinpoint…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

scunlp/elaboration
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMobile Crowdsensing and Crowdsourcing · Software Engineering Techniques and Practices · Reinforcement Learning in Robotics