Synchronization and Turn-Taking in Full-Duplex Speech Dialogue Models
Pablo Riera, Pablo Brusco, Cristina Kuo, Marcelo Sancinetti, and S.R.K. Branavan

TL;DR
This paper investigates how full-duplex speech dialogue models synchronize internal states and anticipate turn-taking, drawing inspiration from neural coupling in human communication, and measures these phenomena under various conditions.
Contribution
It introduces a method to analyze internal synchronization and anticipatory cues in full-duplex dialogue models, demonstrating their presence and robustness under different noise levels.
Findings
Strong representational synchronization occurs under no noise conditions.
Internal states encode anticipatory information for turn-taking prediction.
Synchronization degrades as channel noise increases.
Abstract
Full-duplex spoken dialogue models (SDMs) can listen and speak simultaneously, enabling interaction dynamics closer to human conversation than turn-based systems. Inspired by neural coupling in human communication, we study how such models coordinate their internal representations during interaction. We simulate full-duplex dialogues between two instances of the pretrained \textit{Moshi} model under controlled conditions, manipulating channel noise and decoding bias. Synchronization is measured using Centered Kernel Alignment (CKA) across temporal lags, while anticipatory turn-taking cues are probed from delayed internal activations using causal LSTM models, from both speaker and listener perspectives. We find strong representational synchronization under no noise conditions, peaking near zero lag and degrading with noise, and we show that internal states encode anticipatory information…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
