DSTC: Direct Preference Learning with Only Self-Generated Tests and Code   to Improve Code LMs

Zhihan Liu; Shenao Zhang; Yongfei Liu; Boyi Liu; Yingxiang Yang; and; Zhaoran Wang

arXiv:2411.13611·cs.SE·December 11, 2024

DSTC: Direct Preference Learning with Only Self-Generated Tests and Code to Improve Code LMs

Zhihan Liu, Shenao Zhang, Yongfei Liu, Boyi Liu, Yingxiang Yang, and, Zhaoran Wang

PDF

Open Access

TL;DR

This paper introduces DSTC, a novel framework that uses only self-generated tests and code snippets to improve code language models through direct preference learning, eliminating the need for external annotations.

Contribution

DSTC is a new method that constructs reliable preference pairs from self-generated data, enhancing code model accuracy without external annotations or reward models.

Findings

01

Improves pass@1 score across multiple benchmarks

02

Reduces reliance on costly annotated datasets

03

Enhances model performance without external annotations

Abstract

Direct preference learning offers a promising and computation-efficient beyond supervised fine-tuning (SFT) for improving code generation in coding large language models (LMs). However, the scarcity of reliable preference data is a bottleneck for the performance of direct preference learning to improve the coding accuracy of code LMs. In this paper, we introduce \underline{\textbf{D}}irect Preference Learning with Only \underline{\textbf{S}}elf-Generated \underline{\textbf{T}}ests and \underline{\textbf{C}}ode (DSTC), a framework that leverages only self-generated code snippets and tests to construct reliable preference pairs such that direct preference learning can improve LM coding accuracy without external annotations. DSTC combines a minimax selection process and test-code concatenation to improve preference pair quality, reducing the influence of incorrect self-generated tests and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEducational Technology and Assessment