When2Speak: A Dataset for Temporal Participation and Turn-Taking in Multi-Party Conversations for Large Language Models
Vihaan Nama, Shreya Mendi, Zian Ye, Brinnae Bent

TL;DR
This paper introduces When2Speak, a large synthetic dataset for training LLMs to better decide when to speak in multi-party conversations, improving their participation timing and coherence.
Contribution
The creation of a grounded synthetic dataset and a four-stage pipeline for learning intervention timing in group conversations, with open-source code for reproducibility.
Findings
Supervised fine-tuning on When2Speak improves intervention prediction significantly.
Reinforcement learning reduces missed interventions and increases recall.
Grounded synthetic data effectively enhances multi-party conversational skills.
Abstract
Large Language Models (LLMs) excel at generating contextually appropriate responses but remain poorly calibrated for multi-party conversations, where deciding when to speak is as critical as what to say. In such settings, naively responding at every turn leads to excessive interruptions and degraded conversational coherence. We introduce When2Speak, a grounded synthetic dataset and four-stage generation pipeline for learning intervention timing in group interactions. The dataset comprises over 215,000 examples derived from 16,000 conversations involving 2-6 speakers, spanning diverse conversational styles, tones, and participant dynamics, and explicitly modeling SPEAK vs. SILENT decisions at each turn. Our pipeline combines real-world grounding, structured augmentation, controlled transcript synthesis, and fine-tuning-ready supervision, and is fully open-sourced to support…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
