COS(M+O)S: Curiosity and RL-Enhanced MCTS for Exploring Story Space via   Language Models

Tobias Materzok

arXiv:2501.17104·cs.CL·January 29, 2025

COS(M+O)S: Curiosity and RL-Enhanced MCTS for Exploring Story Space via Language Models

Tobias Materzok

PDF

Open Access

TL;DR

COS(M+O)S introduces a novel framework combining curiosity-driven MCTS and reinforcement learning to enhance open-ended story generation with smaller language models, achieving quality comparable to much larger models.

Contribution

The paper presents COS(M+O)S, a new method that systematically explores story space using curiosity and RL, improving small model storytelling to near large model quality.

Findings

01

COS(M+O)S's top story expansions are favored by participants.

02

It surpasses naive decoding from smaller models.

03

Performance is close to large models, despite capacity limits.

Abstract

We present COS(M+O)S, a System 2-inspired framework for open-ended plot development that systematically explores the vast space of possible story expansions, enabling a 3B-parameter language model to approach the plot quality of a 70B model on select short-story tasks. The method accomplishes this by combining Monte Carlo Tree Search (MCTS), guided by a step-level value model that rewards moderate surprisal (curiosity) while penalizing incoherence, and Odds Ratio Preference Optimization (ORPO) to fine-tune the policy on high-value plot expansions. This iterative reinforcement learning loop systematically explores multiple candidate plot branches, backpropagates quality signals, and adapts the policy for faster convergence, notably shifting the policy from puzzle-based Chain-of-Thought to more character-driven storytelling. In small-scale tests with short-story prompts, 67%-77% of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLanguage and cultural evolution · Natural Language Processing Techniques · Topic Modeling

MethodsLLaMA