CharacterFlywheel: Scaling Iterative Improvement of Engaging and Steerable LLMs in Production
Yixin Nie, Lin Guan, Zhongyao Ma, Anchit Gupta, Yipin Zhou, Xiao Li, Zhengping Zhou, Raymond Zeng, Gelin Zhou, Shigan Chu, Ajay Thampi, Wancen Mu, Nathan Shuster, Ketong Wang, Lin Chen, Jason Brewer, Derek Hao Hu, Alexander McCauley, Jason Weston, Sem Park, Na Zhang, Kevin Tang

TL;DR
CharacterFlywheel is an iterative process that significantly improves large language models for social chat by leveraging continuous deployment, data curation, reward modeling, and fine-tuning, resulting in better engagement and steerability.
Contribution
This work introduces the CharacterFlywheel process, a novel iterative framework for scaling LLM improvements in production social chat applications with rigorous evaluation methods.
Findings
Up to 8.8% increase in engagement breadth
Up to 19.4% increase in engagement depth
Instruction following improved from 59.2% to 84.8%
Abstract
This report presents CharacterFlywheel, an iterative flywheel process for improving large language models (LLMs) in production social chat applications across Instagram, WhatsApp, and Messenger. Starting from LLaMA 3.1, we refined models across 15 generations using data from both internal and external real-user traffic. Through continuous deployments from July 2024 to April 2025, we conducted controlled 7-day A/B tests showing consistent engagement improvements: 7 of 8 newly deployed models demonstrated positive lift over the baseline, with the strongest performers achieving up to 8.8% improvement in engagement breadth and 19.4% in engagement depth. We also observed substantial gains in steerability, with instruction following increasing from 59.2% to 84.8% and instruction violations decreasing from 26.6% to 5.8%. We detail the CharacterFlywheel process which integrates data curation,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsICT in Developing Communities · Topic Modeling · Mobile Crowdsensing and Crowdsourcing
