How Should LLMs Listen While Speaking? A Study of User-Stream Routing in Full-Duplex Spoken Dialogue
Hui Lu, Xueyuan Chen, Huimeng Wang, Shuhai Peng, Shiyin Kang, Xixin Wu, Zhiyong Wu

TL;DR
This paper investigates how to effectively route user input into large language models for full-duplex spoken dialogue, comparing channel fusion and cross-attention strategies through experiments on question answering and interaction benchmarks.
Contribution
It introduces a unified full-duplex spoken dialogue system and systematically compares two user-stream routing strategies, highlighting their tradeoffs in semantic grounding and robustness.
Findings
Channel fusion improves question-answering performance and semantic grounding.
Cross-attention routing offers better robustness to user interruptions and context corruption.
Tradeoffs exist between semantic integration strength and context robustness in routing strategies.
Abstract
Full-duplex spoken dialogue requires a model to keep listening while generating its own spoken response. This is challenging for large language models (LLMs), which are designed to extend a single coherent sequence and do not naturally support user input arriving during generation. We argue that how the user stream is routed into the LLM is therefore a key architectural question for full-duplex modeling. To study this question, we extend a text-only LLM into a unified full-duplex spoken dialogue system and compare two routing strategies under a shared training pipeline: (i) channel fusion, which injects the user stream directly into the LLM input, and (ii) cross-attention routing, which keeps the user stream as external memory accessed through cross-attention adapters. Experiments on spoken question answering and full-duplex interaction benchmarks reveal a clear tradeoff. Channel fusion…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
