A Survey on Recent Advances in Conversational Data Generation
Heydar Soudani, Roxana Petcu, Evangelos Kanoulas, Faegheh Hasibi

TL;DR
This survey reviews recent methods for generating synthetic conversational data, addressing challenges in dataset creation for dialogue systems and categorizing approaches across different system types.
Contribution
It provides a comprehensive framework and categorization of existing research on multi-turn conversational data generation, including evaluation and future directions.
Findings
Synthetic data generation improves scalability and cost-efficiency.
Existing methods are categorized into seed data, utterance generation, and quality filtering.
The survey highlights current challenges and potential research opportunities.
Abstract
Recent advancements in conversational systems have significantly enhanced human-machine interactions across various domains. However, training these systems is challenging due to the scarcity of specialized dialogue data. Traditionally, conversational datasets were created through crowdsourcing, but this method has proven costly, limited in scale, and labor-intensive. As a solution, the development of synthetic dialogue data has emerged, utilizing techniques to augment existing datasets or convert textual resources into conversational formats, providing a more efficient and scalable approach to dataset creation. In this survey, we offer a systematic and comprehensive review of multi-turn conversational data generation, focusing on three types of dialogue systems: open domain, task-oriented, and information-seeking. We categorize the existing research based on key components like seed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Advanced Text Analysis Techniques · AI in Service Interactions
