Writer-R1: Enhancing Generative Writing in LLMs via Memory-augmented Replay Policy Optimization
Jihao Zhao, Shuaishuai Zu, Zhiyuan Ji, Chunlai Zhou, Biao Qin

TL;DR
This paper introduces a novel memory-augmented replay policy optimization method that improves creative writing in large language models by dynamically guiding self-reflection and converting evaluation criteria into reward signals, leading to significant performance gains.
Contribution
It presents a new multi-agent workflow for interpretability and a memory-augmented algorithm that enhances model self-reflection and optimization without extra training, outperforming existing models.
Findings
Criteria-based training matches human annotation performance.
Models outperform baselines and some 100B+ parameter models.
End-to-end optimization improves creative writing quality.
Abstract
As a typical open-ended generation task, creative writing lacks verifiable reference answers, which has long constrained reward modeling and automatic evaluation due to high human annotation costs, evaluative bias, and coarse feedback signals. To address these challenges, this paper first designs a multi-agent collaborative workflow based on Grounded Theory, performing dimensional decomposition and hierarchical induction of the problem to dynamically produce interpretable and reusable fine-grained criteria. Furthermore, we propose the Memory-augmented Replay Policy Optimization (MRPO) algorithm: on the one hand, without additional training, MRPO guides models to engage in self-reflection based on dynamic criteria, enabling controlled iterative improvement; on the other hand, we adopt the training paradigm that combines supervised fine-tuning with reinforcement learning to convert…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Games · Machine Learning in Materials Science · Topic Modeling
