RelayS2S: A Dual-Path Speculative Generation for Real-Time Dialogue
Long Mai

TL;DR
RelayS2S is a hybrid real-time dialogue system that combines a fast speculative response draft with a high-quality slow response, achieving low latency without sacrificing response quality.
Contribution
It introduces a dual-path architecture with a speculative fast path and a high-quality slow path, enabling real-time dialogue with minimal latency and high response quality.
Findings
Achieves P90 onset latency comparable to end-to-end models
Retains 99% of cascaded pipeline response quality
Scalable benefits as slow-path model size increases
Abstract
Real-time spoken dialogue systems face a fundamental tension between latency and response quality. End-to-end speech-to-speech (S2S) models respond immediately and naturally handle turn-taking, backchanneling, and interruption, but produce semantically weaker outputs. Cascaded pipelines (ASR -> LLM) deliver stronger responses at the cost of latency that grows with model size. We present RelayS2S, a hybrid architecture that runs two paths in parallel upon turn detection. The fast path -- a duplex S2S model -- speculatively drafts a short response prefix that is streamed immediately to TTS for low-latency audio onset, while continuing to monitor live audio events. The slow path -- a cascaded ASR -> LLM pipeline -- generates a higher-quality continuation conditioned on the committed prefix, producing a seamless utterance. A lightweight learned verifier gates the handoff, committing the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Topic Modeling · Speech Recognition and Synthesis
