Speak or Stay Silent: Context-Aware Turn-Taking in Multi-Party Dialogue
Kratika Bhagtani, Mrinal Anand, Yu Chen Xu, Amit Kumar Singh Yadav

TL;DR
This paper addresses the challenge of turn-taking in multi-party voice conversations, proposing a context-aware method that improves AI assistant behavior by explicitly training for appropriate speaking decisions.
Contribution
The paper introduces a new benchmark dataset and a supervised fine-tuning approach with reasoning traces to enable context-aware turn-taking in multi-party dialogue systems.
Findings
Large language models fail at zero-shot turn-taking in multi-party settings.
Supervised fine-tuning with reasoning traces significantly improves turn-taking accuracy.
Explicit training is necessary for effective context-aware turn-taking, as it is not an emergent capability.
Abstract
Existing voice AI assistants treat every detected pause as an invitation to speak. This works in dyadic dialogue, but in multi-party settings, where an AI assistant participates alongside multiple speakers, pauses are abundant and ambiguous. An assistant that speaks on every pause becomes disruptive rather than useful. In this work, we formulate context-aware turn-taking: at every detected pause, given the full conversation context, our method decides whether the assistant should speak or stay silent. We introduce a benchmark of over 120K labeled conversations spanning three multi-party corpora. Evaluating eight recent large language models, we find that they consistently fail at context-aware turn-taking under zero-shot prompting. We then propose a supervised fine-tuning approach with reasoning traces, improving balanced accuracy by up to 23 percentage points. Our findings suggest that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗ishiki-labs/qwen3-8b-amimodel· 56 dl56 dl
- 🤗ishiki-labs/qwen2.5-7b-amimodel· 55 dl55 dl
- 🤗ishiki-labs/mistral-7b-amimodel· 45 dl45 dl
- 🤗ishiki-labs/llama3.1-8b-amimodel· 46 dl46 dl
- 🤗ishiki-labs/gpt-oss-20b-amimodel· 46 dl46 dl
- 🤗ishiki-labs/qwen3-8b-friendsmodel· 49 dl49 dl
- 🤗ishiki-labs/qwen2.5-7b-friendsmodel· 41 dl41 dl
- 🤗ishiki-labs/mistral-7b-friendsmodel· 48 dl48 dl
- 🤗ishiki-labs/llama3.1-8b-friendsmodel· 56 dl56 dl
- 🤗ishiki-labs/gpt-oss-20b-friendsmodel· 46 dl46 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Speech and dialogue systems · AI in Service Interactions
