G-Zero: Self-Play for Open-Ended Generation from Zero Data

Chengsong Huang; Haolin Liu; Tong Zheng; Runpeng Dai; Langlin Huang; Jinyuan Li; Zongxia Li; Zhepei Wei; Yu Meng; Jiaxin Huang

arXiv:2605.09959·cs.LG·May 12, 2026

G-Zero: Self-Play for Open-Ended Generation from Zero Data

Chengsong Huang, Haolin Liu, Tong Zheng, Runpeng Dai, Langlin Huang, Jinyuan Li, Zongxia Li, Zhepei Wei, Yu Meng, Jiaxin Huang

PDF

1 Repo

TL;DR

G-Zero introduces a verifier-free, co-evolutionary framework enabling large language models to self-improve in open-ended tasks without external judges, using intrinsic rewards and internal dynamics.

Contribution

The paper proposes Hint-$ extdelta$, a novel intrinsic reward, and a co-evolutionary training method for LLMs that eliminates reliance on external evaluators.

Findings

01

G-Zero achieves continuous self-improvement without external verification.

02

Theoretical guarantees are provided for the idealized DPO version of G-Zero.

03

The framework effectively targets model blind spots through internal distributional signals.

Abstract

Self-evolving LLMs excel in verifiable domains but struggle in open-ended tasks, where reliance on proxy LLM judges introduces capability bottlenecks and reward hacking. To overcome this, we introduce G-Zero, a verifier-free, co-evolutionary framework for autonomous self-improvement. Our core innovation is Hint- $δ$ , an intrinsic reward that quantifies the predictive shift between a Generator model's unassisted response and its response conditioned on a self-generated hint. Using this signal, a Proposer model is trained via GRPO to continuously target the Generator's blind spots by synthesizing challenging queries and informative hints. The Generator is concurrently optimized via DPO to internalize these hint-guided improvements. Theoretically, we prove a best-iterate suboptimality guarantee for an idealized standard-DPO version of G-Zero, provided that the Proposer induces…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

chengsong-huang/G-Zero
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.