DiaSynth: Synthetic Dialogue Generation Framework for Low Resource   Dialogue Applications

Sathya Krishnan Suresh; Wu Mengjun; Tushar Pranav; Eng Siong Chng

arXiv:2409.19020·cs.CL·February 11, 2025

DiaSynth: Synthetic Dialogue Generation Framework for Low Resource Dialogue Applications

Sathya Krishnan Suresh, Wu Mengjun, Tushar Pranav, Eng Siong Chng

PDF

Open Access 1 Video

TL;DR

DiaSynth is a novel framework that leverages Large Language Models and Chain of Thought reasoning to generate high-quality synthetic dialogues across various domains, addressing data scarcity in dialogue system development.

Contribution

It introduces a synthetic dialogue generation framework using LLMs and CoT reasoning, outperforming traditional data collection methods and capturing most of the in-domain data performance.

Findings

01

Synthetic data improves dialogue summarization performance by 16.47%.

02

Synthetic data captures 90.48% of in-domain data performance.

03

Larger LLMs (8B) produce higher quality synthetic dialogues.

Abstract

The scarcity of domain-specific dialogue datasets limits the development of dialogue systems across applications. Existing research is constrained by general or niche datasets that lack sufficient scale for training dialogue systems. To address this gap, we introduce DiaSynth - a synthetic dialogue generation framework capable of generating high-quality, contextually rich dialogues across a wide range of domains. Unlike existing frameworks, DiaSynth uses Large Language Models (LLMs) and Chain of Thought (CoT) reasoning to generate dynamic, domain-specific dialogues with simulated personas and diverse conversational features. We perform our experiments by generating synthetic data using different LLMs and few-shot examples from DialogSum and SAMSum. The pretrained language models fine-tuned on the synthetic data outperform the base models by 16.47% on dialogue summarization, while the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

DiaSynth: Synthetic Dialogue Generation Framework for Low Resource Dialogue Applications· underline

Taxonomy

TopicsSpeech and dialogue systems · Topic Modeling · AI in Service Interactions

MethodsBalanced Selection