Syn-TurnTurk: A Synthetic Dataset for Turn-Taking Prediction in Turkish Dialogues
Ahmet Tu\u{g}rul Bayrak, Mustafa Serta\c{c} T\"urkel, Fatma Nur Korkmaz

TL;DR
This paper presents Syn-TurnTurk, a synthetic Turkish dialogue dataset created with LLMs to improve turn-taking prediction in voice chatbots, especially for Turkish language applications.
Contribution
The paper introduces a novel synthetic dataset for Turkish turn-taking prediction, enabling better natural dialogue management in voice-based systems.
Findings
Advanced models like BI-LSTM and ensemble methods achieve high accuracy (0.839).
The dataset effectively captures linguistic cues for turn-taking.
Synthetic data improves model performance in Turkish dialogue scenarios.
Abstract
Managing natural dialogue timing is a significant challenge for voice-based chatbots. Most current systems usually rely on simple silence detection, which often fails because human speech patterns involve irregular pauses. This causes bots to interrupt users, breaking the conversational flow. This problem is even more severe for languages like Turkish, which lack high-quality datasets for turn-taking prediction. This paper introduces Syn-TurnTurk, a synthetic Turkish dialogue dataset generated using various Qwen Large Language Models (LLMs) to mirror real-life verbal exchanges, including overlaps and strategic silences. We evaluated the dataset using several traditional and deep learning architectures. The results show that advanced models, particularly BI-LSTM and Ensemble (LR+RF) methods, achieve high accuracy (0.839) and AUC scores (0.910). These findings demonstrate that our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
