GiFT: Gibbs Fine-Tuning for Code Generation

Haochen Li; Wanjin Feng; Xin Zhou; Zhiqi Shen

arXiv:2502.11466·cs.LG·May 22, 2025

GiFT: Gibbs Fine-Tuning for Code Generation

Haochen Li, Wanjin Feng, Xin Zhou, Zhiqi Shen

PDF

Open Access 1 Repo

TL;DR

GiFT introduces a novel self-training method for code generation that mitigates bias by sampling from the joint description-code space, improving model performance on challenging benchmarks.

Contribution

The paper proposes Gibbs Fine-Tuning (GiFT), a new self-training approach inspired by Gibbs sampling, to better utilize the joint distribution of descriptions and code in LLM fine-tuning.

Findings

01

GiFT outperforms baseline models on multiple datasets.

02

The method effectively addresses long-tail distribution issues.

03

Empirical results show improved performance on challenging benchmarks.

Abstract

Training Large Language Models (LLMs) with synthetic data is a prevalent practice in code generation. A key approach is self-training, where LLMs are iteratively trained on self-generated correct code snippets. In this case, the self-generated codes are drawn from a conditional distribution, conditioned on a specific seed description. However, the seed description is not the only valid representation that aligns with its intended meaning. With all valid descriptions and codes forming a joint space, codes drawn from the conditional distribution would lead to an underrepresentation of the full description-code space. As such, we propose Gibbs Fine-Tuning (GiFT), a novel self-training method inspired by Gibbs sampling. GiFT allows self-generated data to be drawn from the marginal distribution of the joint space, thereby mitigating the biases inherent in conditional sampling. We provide a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

alex-haochenli/gift
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications