Coffee-Gym: An Environment for Evaluating and Improving Natural Language Feedback on Erroneous Code
Hyungjoo Chae, Taeyoon Kwon, Seungjun Moon, Yongho Song, Dongjin Kang,, Kai Tzu-iunn Ong, Beong-woo Kwak, Seonghyeon Bae, Seung-won Hwang, Jinyoung, Yeo

TL;DR
Coffee-Gym is a new RL environment with a dataset and reward function for training models to give helpful feedback on erroneous code, improving code editing capabilities of open-source LLMs.
Contribution
It introduces Coffee-Gym, a comprehensive environment with a dataset and reward function, addressing the lack of high-quality data for training code feedback models with RL.
Findings
Feedback models trained with Coffee-Gym outperform baselines.
Models achieve performance comparable to closed-source LLMs.
The dataset and models are publicly available.
Abstract
This paper presents Coffee-Gym, a comprehensive RL environment for training models that provide feedback on code editing. Coffee-Gym includes two major components: (1) Coffee, a dataset containing humans' code edit traces for coding questions and machine-written feedback for editing erroneous code; (2) CoffeeEval, a reward function that faithfully reflects the helpfulness of feedback by assessing the performance of the revised code in unit tests. With them, Coffee-Gym addresses the unavailability of high-quality datasets for training feedback models with RL, and provides more accurate rewards than the SOTA reward model (i.e., GPT-4). By applying Coffee-Gym, we elicit feedback models that outperform baselines in enhancing open-source code LLMs' code editing, making them comparable with closed-source LLMs. We make the dataset and the model checkpoint publicly available.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning · Educational Technology and Assessment
