Real or Robotic? Assessing Whether LLMs Accurately Simulate Qualities of Human Responses in Dialogue
Jonathan Ivey, Shivani Kumar, Jiayu Liu, Hua Shen and, Sushrita Rakshit, Rohan Raju, Haotian Zhang, Aparna, Ananthasubramaniam, Junghwan Kim, Bowen Yi, Dustin Wright and, Abraham Israeli, Anders Giovanni M{\o}ller, Lechen Zhang, David, Jurgens

TL;DR
This study evaluates whether large language models can accurately simulate human dialogue qualities by comparing generated LLM-LLM and human-LLM interactions, revealing significant divergence and language-specific performance patterns.
Contribution
The paper provides a large-scale analysis of LLM-generated dialogues versus human dialogues, highlighting the limitations and divergence in style and content across languages.
Findings
Low alignment between LLM simulations and human dialogues
Models perform similarly across English, Chinese, and Russian
LLMs better simulate human responses when writing style is similar to their own
Abstract
Studying and building datasets for dialogue tasks is both expensive and time-consuming due to the need to recruit, train, and collect data from study participants. In response, much recent work has sought to use large language models (LLMs) to simulate both human-human and human-LLM interactions, as they have been shown to generate convincingly human-like text in many settings. However, to what extent do LLM-based simulations \textit{actually} reflect human dialogues? In this work, we answer this question by generating a large-scale dataset of 100,000 paired LLM-LLM and human-LLM dialogues from the WildChat dataset and quantifying how well the LLM simulations align with their human counterparts. Overall, we find relatively low alignment between simulations and human interactions, demonstrating a systematic divergence along the multiple textual properties, including style and content.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Topic Modeling · Multi-Agent Systems and Negotiation
MethodsALIGN
