Writing-Zero: Bridge the Gap Between Non-verifiable Tasks and Verifiable Rewards

Ruipeng Jia; Yunyi Yang; Yongbo Gai; Kai Luo; Shihao Huang; Jianhe Lin; Xiaoxi Jiang; Guanjun Jiang

arXiv:2506.00103·cs.CL·June 12, 2025

Writing-Zero: Bridge the Gap Between Non-verifiable Tasks and Verifiable Rewards

Ruipeng Jia, Yunyi Yang, Yongbo Gai, Kai Luo, Shihao Huang, Jianhe Lin, Xiaoxi Jiang, Guanjun Jiang

PDF

Open Access

TL;DR

This paper introduces Writing-Zero, a novel RLVR-based training framework that enhances language models' creative writing abilities by transforming subjective assessments into verifiable rewards, reducing reward hacking, and improving performance on writing tasks.

Contribution

It proposes a unified RLVR paradigm with a pairwise GenRM and BRPO algorithm, enabling robust, reference-free training for creative language tasks without supervised fine-tuning.

Findings

01

Writing-Zero improves writing quality and robustness against reward hacking.

02

The approach achieves competitive results on multiple writing benchmarks.

03

It demonstrates potential to unify various reward modeling methods under RLVR.

Abstract

Reinforcement learning with verifiable rewards (RLVR) has enabled large language models (LLMs) to achieve remarkable breakthroughs in reasoning tasks with objective ground-truth answers, such as mathematics and code generation. However, a significant gap remains for non-verifiable tasks, like creative writing and open-ended dialogue, where quality assessment is inherently subjective and lacks definitive references. Existing approaches for these domains often rely on scalar reward models trained with human preferences, which suffer from limited generalization and are prone to reward hacking, such as over-explanation and length bias. In this work, we propose a unified RLVR-based training paradigm that bridges the gap between non-verifiable tasks and verifiable rewards. We introduce a writing-principle-based pairwise Generative Reward Model (GenRM) and a novel Bootstrapped Relative Policy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications