ELABORATION: A Comprehensive Benchmark on Human-LLM Competitive Programming
Xinwei Yang, Zhaofeng Liu, Chen Huang, Jiashuai Zhang, Tong Zhang, Yifan Zhang, Wenqiang Lei

TL;DR
This paper introduces ELABORATION, a comprehensive benchmark for evaluating human-LLM collaboration in competitive programming, including a new taxonomy of human feedback, a specialized dataset, and an assessment framework to identify strengths and weaknesses.
Contribution
It presents the first taxonomy of human feedback in programming, a dedicated dataset for human-LLM collaboration, and a benchmark for thorough evaluation of methods.
Findings
Identified key strengths and weaknesses of current approaches.
Provided a new dataset annotated for human feedback simulation.
Established a benchmark for future research in human-LLM collaborative programming.
Abstract
While recent research increasingly emphasizes the value of human-LLM collaboration in competitive programming and proposes numerous empirical methods, a comprehensive understanding remains elusive due to the fragmented nature of existing studies and their use of diverse, application-specific human feedback. Thus, our work serves a three-fold purpose: First, we present the first taxonomy of human feedback consolidating the entire programming process, which promotes fine-grained evaluation. Second, we introduce ELABORATIONSET, a novel programming dataset specifically designed for human-LLM collaboration, meticulously annotated to enable large-scale simulated human feedback and facilitate costeffective real human interaction studies. Third, we introduce ELABORATION, a novel benchmark to facilitate a thorough assessment of human-LLM competitive programming. With ELABORATION, we pinpoint…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMobile Crowdsensing and Crowdsourcing · Software Engineering Techniques and Practices · Reinforcement Learning in Robotics
