Coffee-Gym: An Environment for Evaluating and Improving Natural Language   Feedback on Erroneous Code

Hyungjoo Chae; Taeyoon Kwon; Seungjun Moon; Yongho Song; Dongjin Kang,; Kai Tzu-iunn Ong; Beong-woo Kwak; Seonghyeon Bae; Seung-won Hwang; Jinyoung; Yeo

arXiv:2409.19715·cs.CL·October 7, 2024

Coffee-Gym: An Environment for Evaluating and Improving Natural Language Feedback on Erroneous Code

Hyungjoo Chae, Taeyoon Kwon, Seungjun Moon, Yongho Song, Dongjin Kang,, Kai Tzu-iunn Ong, Beong-woo Kwak, Seonghyeon Bae, Seung-won Hwang, Jinyoung, Yeo

PDF

Open Access

TL;DR

Coffee-Gym is a new RL environment with a dataset and reward function for training models to give helpful feedback on erroneous code, improving code editing capabilities of open-source LLMs.

Contribution

It introduces Coffee-Gym, a comprehensive environment with a dataset and reward function, addressing the lack of high-quality data for training code feedback models with RL.

Findings

01

Feedback models trained with Coffee-Gym outperform baselines.

02

Models achieve performance comparable to closed-source LLMs.

03

The dataset and models are publicly available.

Abstract

This paper presents Coffee-Gym, a comprehensive RL environment for training models that provide feedback on code editing. Coffee-Gym includes two major components: (1) Coffee, a dataset containing humans' code edit traces for coding questions and machine-written feedback for editing erroneous code; (2) CoffeeEval, a reward function that faithfully reflects the helpfulness of feedback by assessing the performance of the revised code in unit tests. With them, Coffee-Gym addresses the unavailability of high-quality datasets for training feedback models with RL, and provides more accurate rewards than the SOTA reward model (i.e., GPT-4). By applying Coffee-Gym, we elicit feedback models that outperform baselines in enhancing open-source code LLMs' code editing, making them comparable with closed-source LLMs. We make the dataset and the model checkpoint publicly available.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning · Educational Technology and Assessment