How Should LLMs Listen While Speaking? A Study of User-Stream Routing in Full-Duplex Spoken Dialogue

Hui Lu; Xueyuan Chen; Huimeng Wang; Shuhai Peng; Shiyin Kang; Xixin Wu; Zhiyong Wu

arXiv:2605.10199·cs.CL·May 12, 2026

How Should LLMs Listen While Speaking? A Study of User-Stream Routing in Full-Duplex Spoken Dialogue

Hui Lu, Xueyuan Chen, Huimeng Wang, Shuhai Peng, Shiyin Kang, Xixin Wu, Zhiyong Wu

PDF

TL;DR

This paper investigates how to effectively route user input into large language models for full-duplex spoken dialogue, comparing channel fusion and cross-attention strategies through experiments on question answering and interaction benchmarks.

Contribution

It introduces a unified full-duplex spoken dialogue system and systematically compares two user-stream routing strategies, highlighting their tradeoffs in semantic grounding and robustness.

Findings

01

Channel fusion improves question-answering performance and semantic grounding.

02

Cross-attention routing offers better robustness to user interruptions and context corruption.

03

Tradeoffs exist between semantic integration strength and context robustness in routing strategies.

Abstract

Full-duplex spoken dialogue requires a model to keep listening while generating its own spoken response. This is challenging for large language models (LLMs), which are designed to extend a single coherent sequence and do not naturally support user input arriving during generation. We argue that how the user stream is routed into the LLM is therefore a key architectural question for full-duplex modeling. To study this question, we extend a text-only LLM into a unified full-duplex spoken dialogue system and compare two routing strategies under a shared training pipeline: (i) channel fusion, which injects the user stream directly into the LLM input, and (ii) cross-attention routing, which keeps the user stream as external memory accessed through cross-attention adapters. Experiments on spoken question answering and full-duplex interaction benchmarks reveal a clear tradeoff. Channel fusion…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.