Creativity Has Left the Chat: The Price of Debiasing Language Models
Behnam Mohammadi

TL;DR
This paper examines how reinforcement learning from human feedback (RLHF) reduces the creativity of large language models by decreasing output diversity and causing models to cluster around specific states, impacting creative applications.
Contribution
It reveals the unintended reduction in creativity caused by RLHF in LLMs and discusses implications for creative tasks and prompt engineering strategies.
Findings
Aligned models show lower entropy in token predictions
Models form distinct clusters in embedding space
Models tend to gravitate towards attractor states
Abstract
Large Language Models (LLMs) have revolutionized natural language processing but can exhibit biases and may generate toxic content. While alignment techniques like Reinforcement Learning from Human Feedback (RLHF) reduce these issues, their impact on creativity, defined as syntactic and semantic diversity, remains unexplored. We investigate the unintended consequences of RLHF on the creativity of LLMs through three experiments focusing on the Llama-2 series. Our findings reveal that aligned models exhibit lower entropy in token predictions, form distinct clusters in the embedding space, and gravitate towards "attractor states", indicating limited output diversity. Our findings have significant implications for marketers who rely on LLMs for creative tasks such as copywriting, ad creation, and customer persona generation. The trade-off between consistency and creativity in aligned models…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsKnowledge Management and Sharing · Wikis in Education and Collaboration
MethodsBalanced Selection
