RLMR: Reinforcement Learning with Mixed Rewards for Creative Writing

Jianxing Liao; Tian Zhang; Xiao Feng; Yusong Zhang; Rui Yang; Haorui Wang; Bosi Wen; Ziying Wang; Runzhi Shi

arXiv:2508.18642·cs.AI·August 29, 2025

RLMR: Reinforcement Learning with Mixed Rewards for Creative Writing

Jianxing Liao, Tian Zhang, Xiao Feng, Yusong Zhang, Rui Yang, Haorui Wang, Bosi Wen, Ziying Wang, Runzhi Shi

PDF

1 Video

TL;DR

This paper introduces RLMR, a reinforcement learning approach that dynamically balances subjective writing quality and objective constraints, significantly improving creative writing performance in large language models.

Contribution

RLMR is the first method to combine subjective preferences with objective verification in online RL training for creative writing.

Findings

01

Improved instruction following from 83.36% to 86.65%.

02

Achieved a 72.75% win rate in manual evaluations.

03

Demonstrated effectiveness across models from 8B to 72B parameters.

Abstract

Large language models are extensively utilized in creative writing applications. Creative writing requires a balance between subjective writing quality (e.g., literariness and emotional expression) and objective constraint following (e.g., format requirements and word limits). Existing methods find it difficult to balance these two aspects: single reward strategies fail to improve both abilities simultaneously, while fixed-weight mixed-reward methods lack the ability to adapt to different writing scenarios. To address this problem, we propose Reinforcement Learning with Mixed Rewards (RLMR), utilizing a dynamically mixed reward system from a writing reward model evaluating subjective writing quality and a constraint verification model assessing objective constraint following. The constraint following reward weight is adjusted dynamically according to the writing quality within sampled…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

RLMR: Reinforcement Learning with Mixed Rewards for Creative Writing· underline