UniCreative: Unifying Long-form Logic and Short-form Sparkle via Reference-Free Reinforcement Learning

Xiaolong Wei; Zerun Zhu; Simin Niu; Xingyu Zhang; Peiying Yu; Changxuan Xiao; Yuchen Li; Jicheng Yang; Zhejun Zhao; Chong Meng; Long Xia; Daiting Shi

arXiv:2604.05517·cs.AI·April 8, 2026

UniCreative: Unifying Long-form Logic and Short-form Sparkle via Reference-Free Reinforcement Learning

Xiaolong Wei, Zerun Zhu, Simin Niu, Xingyu Zhang, Peiying Yu, Changxuan Xiao, Yuchen Li, Jicheng Yang, Zhejun Zhao, Chong Meng, Long Xia, Daiting Shi

PDF

TL;DR

UniCreative introduces a unified reinforcement learning framework for creative writing that dynamically balances long-term coherence and short-term expressiveness without relying on supervised data.

Contribution

It proposes AC-GenRM and ACPO, novel methods for adaptive reward modeling and policy optimization that align with human preferences across diverse writing styles.

Findings

01

AC-GenRM closely matches expert evaluations.

02

ACPO improves performance on various writing tasks.

03

Model learns to distinguish task types autonomously.

Abstract

A fundamental challenge in creative writing lies in reconciling the inherent tension between maintaining global coherence in long-form narratives and preserving local expressiveness in short-form texts. While long-context generation necessitates explicit macroscopic planning, short-form creativity often demands spontaneous, constraint-free expression. Existing alignment paradigms, however, typically employ static reward signals and rely heavily on high-quality supervised data, which is costly and difficult to scale. To address this, we propose \textbf{UniCreative}, a unified reference-free reinforcement learning framework. We first introduce \textbf{AC-GenRM}, an adaptive constraint-aware reward model that dynamically synthesizes query-specific criteria to provide fine-grained preference judgments. Leveraging these signals, we propose \textbf{ACPO}, a policy optimization algorithm that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.