Robust Preference Learning for Storytelling via Contrastive   Reinforcement Learning

Louis Castricato; Alexander Havrilla; Shahbuland Matiana; Michael; Pieler; Anbang Ye; Ian Yang; Spencer Frazier; Mark Riedl

arXiv:2210.07792·cs.CL·December 16, 2022·1 cites

Robust Preference Learning for Storytelling via Contrastive Reinforcement Learning

Louis Castricato, Alexander Havrilla, Shahbuland Matiana, Michael, Pieler, Anbang Ye, Ian Yang, Spencer Frazier, Mark Riedl

PDF

Open Access

TL;DR

This paper introduces a contrastive reinforcement learning approach to improve controlled story generation, creating a robust preference model that aligns stories with human critiques and enhances generation quality.

Contribution

It develops CARP, a contrastive bi-encoder for preference modeling, and demonstrates its effectiveness in fine-tuning story generation with improved robustness and human preference alignment.

Findings

01

Full pipeline outperforms larger LLMs and logit-based methods in human preference tests

02

Contrastive reward modeling enhances story generation robustness

03

Human study confirms improved quality of generated stories

Abstract

Controlled automated story generation seeks to generate natural language stories satisfying constraints from natural language critiques or preferences. Existing methods to control for story preference utilize prompt engineering which is labor intensive and often inconsistent. They may also use logit-manipulation methods which require annotated datasets to exist for the desired attributes. To address these issues, we first train a contrastive bi-encoder model to align stories with corresponding human critiques, named CARP, building a general purpose preference model. This is subsequently used as a reward function to fine-tune a generative language model via reinforcement learning. However, simply fine-tuning a generative language model with a contrastive reward model does not always reliably result in a story generation system capable of generating stories that meet user preferences. To…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Artificial Intelligence in Games

MethodsContrastive Learning · ALIGN