# Igniting Creative Writing in Small Language Models: LLM-as-a-Judge versus Multi-Agent Refined Rewards

**Authors:** Xiaolong Wei, Bo Lu, Xingyu Zhang, Zhejun Zhao, Dongdong Shen, Long Xia, Dawei Yin

arXiv: 2508.21476 · 2025-09-01

## TL;DR

This paper compares two AI-driven reward strategies to enhance the creative writing abilities of a small 7B-parameter Chinese language model, demonstrating that a principle-guided LLM-as-a-Judge outperforms traditional methods in quality and efficiency.

## Contribution

It introduces a novel LLM-as-a-Judge reward mechanism with adversarial training and reflection, improving creative output and training efficiency for small language models.

## Key findings

- LLM-as-a-Judge yields higher quality creative outputs.
- The proposed methods reduce reliance on human-annotated data.
- Automated evaluations align well with human judgments.

## Abstract

Large Language Models (LLMs) have demonstrated remarkable creative writing capabilities, yet their substantial computational demands hinder widespread use. Enhancing Small Language Models (SLMs) offers a promising alternative, but current methods like Supervised Fine-Tuning (SFT) struggle with novelty, and Reinforcement Learning from Human Feedback (RLHF) is costly. This paper explores two distinct AI-driven reward strategies within a Reinforcement Learning from AI Feedback (RLAIF) framework to ignite the creative writing of a 7B-parameter SLM, specifically for generating Chinese greetings. The first strategy employs a RM trained on high-quality preference data curated by a novel multi-agent rejection sampling framework designed for creative tasks. The second, more novel strategy utilizes a principle-guided LLM-as-a-Judge, whose reward function is optimized via an adversarial training scheme with a reflection mechanism, to directly provide reward signals. Comprehensive experiments reveal that while both approaches significantly enhance creative output over baselines, the principle-guided LLM-as-a-Judge demonstrably yields superior generation quality. Furthermore, it offers notable advantages in training efficiency and reduced dependency on human-annotated data, presenting a more scalable and effective path towards creative SLMs. Our automated evaluation methods also exhibit strong alignment with human judgments. Our code and data are publicly available at https://github.com/weixiaolong94-hub/Igniting-Creative-Writing-in-Small-Language-Models.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2508.21476/full.md

## Figures

15 figures with captions in the complete paper: https://tomesphere.com/paper/2508.21476/full.md

## References

55 references — full list in the complete paper: https://tomesphere.com/paper/2508.21476/full.md

---
Source: https://tomesphere.com/paper/2508.21476