COS(M+O)S: Curiosity and RL-Enhanced MCTS for Exploring Story Space via Language Models
Tobias Materzok

TL;DR
COS(M+O)S introduces a novel framework combining curiosity-driven MCTS and reinforcement learning to enhance open-ended story generation with smaller language models, achieving quality comparable to much larger models.
Contribution
The paper presents COS(M+O)S, a new method that systematically explores story space using curiosity and RL, improving small model storytelling to near large model quality.
Findings
COS(M+O)S's top story expansions are favored by participants.
It surpasses naive decoding from smaller models.
Performance is close to large models, despite capacity limits.
Abstract
We present COS(M+O)S, a System 2-inspired framework for open-ended plot development that systematically explores the vast space of possible story expansions, enabling a 3B-parameter language model to approach the plot quality of a 70B model on select short-story tasks. The method accomplishes this by combining Monte Carlo Tree Search (MCTS), guided by a step-level value model that rewards moderate surprisal (curiosity) while penalizing incoherence, and Odds Ratio Preference Optimization (ORPO) to fine-tune the policy on high-value plot expansions. This iterative reinforcement learning loop systematically explores multiple candidate plot branches, backpropagates quality signals, and adapts the policy for faster convergence, notably shifting the policy from puzzle-based Chain-of-Thought to more character-driven storytelling. In small-scale tests with short-story prompts, 67%-77% of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLanguage and cultural evolution · Natural Language Processing Techniques · Topic Modeling
MethodsLLaMA
