Syn-TurnTurk: A Synthetic Dataset for Turn-Taking Prediction in Turkish Dialogues

Ahmet Tu\u{g}rul Bayrak; Mustafa Serta\c{c} T\"urkel; Fatma Nur Korkmaz

arXiv:2604.13620·cs.CL·April 16, 2026

Syn-TurnTurk: A Synthetic Dataset for Turn-Taking Prediction in Turkish Dialogues

Ahmet Tu\u{g}rul Bayrak, Mustafa Serta\c{c} T\"urkel, Fatma Nur Korkmaz

PDF

1 Datasets

TL;DR

This paper presents Syn-TurnTurk, a synthetic Turkish dialogue dataset created with LLMs to improve turn-taking prediction in voice chatbots, especially for Turkish language applications.

Contribution

The paper introduces a novel synthetic dataset for Turkish turn-taking prediction, enabling better natural dialogue management in voice-based systems.

Findings

01

Advanced models like BI-LSTM and ensemble methods achieve high accuracy (0.839).

02

The dataset effectively captures linguistic cues for turn-taking.

03

Synthetic data improves model performance in Turkish dialogue scenarios.

Abstract

Managing natural dialogue timing is a significant challenge for voice-based chatbots. Most current systems usually rely on simple silence detection, which often fails because human speech patterns involve irregular pauses. This causes bots to interrupt users, breaking the conversational flow. This problem is even more severe for languages like Turkish, which lack high-quality datasets for turn-taking prediction. This paper introduces Syn-TurnTurk, a synthetic Turkish dialogue dataset generated using various Qwen Large Language Models (LLMs) to mirror real-life verbal exchanges, including overlaps and strategic silences. We evaluated the dataset using several traditional and deep learning architectures. The results show that advanced models, particularly BI-LSTM and Ensemble (LR+RF) methods, achieve high accuracy (0.839) and AUC scores (0.910). These findings demonstrate that our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

tugrulbayrak/Syn-TurnTurk
dataset· 89 dl
89 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.