Learning to Simulate Human Dialogue
Kanishk Gandhi, Agam Bhatia, Noah D. Goodman

TL;DR
This paper investigates different training methods for next-turn dialogue prediction, finding that directly maximizing the likelihood of human responses yields better alignment with actual human dialogue than judge-based rewards, especially when models are allowed to think before responding.
Contribution
It introduces a novel analysis of thinking versus non-thinking models in dialogue prediction and demonstrates that likelihood maximization outperforms judge-based rewards in modeling human responses.
Findings
Likelihood maximization improves human response prediction.
Judge-based rewards increase semantic scores but reduce human-likeness.
Allowing models to think before responding can hinder performance when trained with judge-based rewards.
Abstract
To predict what someone will say is to model how they think. We study this through next-turn dialogue prediction: given a conversation, predict the next utterance produced by a person. We compare learning approaches along two dimensions: (1) whether the model is allowed to think before responding, and (2) how learning is rewarded either through an LLM-as-a-judge that scores semantic similarity and information completeness relative to the ground-truth response, or by directly maximizing the log-probability of the true human dialogue. We find that optimizing for judge-based rewards indeed increases judge scores throughout training, however it decreases the likelihood assigned to ground truth human responses and decreases the win rate when human judges choose the most human-like response among a real and synthetic option. This failure is amplified when the model is allowed to think before…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Speech and dialogue systems · Multimodal Machine Learning Applications
