Why Synthetic Isn't Real Yet: A Diagnostic Framework for Contact Center Dialogue Generation
Rishikesh Devanathan, Varun Nathan, Ayush Kumar

TL;DR
This paper introduces a diagnostic framework with 17 metrics to evaluate the realism of synthetic contact center dialogues, revealing current generation methods' shortcomings in capturing authentic conversational properties.
Contribution
It presents a comprehensive diagnostic evaluation framework and benchmarks multiple generation strategies, highlighting the gaps in synthetic dialogue realism for downstream tasks.
Findings
Synthetic transcripts underperform real transcripts in quality assurance tasks.
Current generation methods lack in sentiment fidelity and conversational realism.
Structured supervision improves some aspects but does not fully bridge the realism gap.
Abstract
Synthetic data is increasingly critical for contact centers, where privacy constraints and data scarcity limit the availability of real conversations. However, generating synthetic dialogues that are realistic and useful for downstream applications remains challenging. In this work, we benchmark multiple generation strategies guided by structured supervision on call attributes (Intent Summaries, Topic Flows, and Quality Assurance (QA) Forms) across multiple languages. To test downstream utility, we evaluate synthetic transcripts on an automated quality assurance (AutoQA) task, finding that prompts optimized on real transcripts consistently outperform those optimized on synthetic transcripts. These results suggest that current synthetic transcripts fall short in capturing the full realism of real agent-customer interactions. To highlight these downstream gaps, we introduce a diagnostic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
