TiCo: Time-Controllable Spoken Dialogue Model
Kai-Wei Chang, Wei-Chih Chen, En-Pei Hu, Hung-yi Lee, James Glass

TL;DR
TiCo is a novel time-controllable spoken dialogue model that can generate responses with specified durations, improving interaction quality in voice systems by estimating and adjusting speaking time.
Contribution
The paper introduces TiCo, the first time-aware SDM with a benchmark, enabling duration control through Spoken Time Markers and efficient post-training without paired data.
Findings
TiCo reduces duration error by 2.7x compared to its backbone.
TiCo outperforms baselines in maintaining target response durations.
TiCo preserves response quality while controlling speaking time.
Abstract
We introduce TiCo, a time-controllable spoken dialogue model (SDM) that follows time-constrained instructions (e.g., "Please generate a response lasting about 15 seconds") and generates spoken responses with controllable duration. This capability is valuable for real-world spoken language systems such as voice assistants and interactive agents, where controlling response duration can improve interaction quality. However, despite their strong ability to generate natural spoken responses, existing models lack time awareness and struggle to follow duration-related instructions. To systematically evaluate this, we introduce TiCo-Bench, the first benchmark for time-controllable instruction following in SDMs, on which existing open-source and commercial models frequently fail to satisfy explicit time constraints. TiCo addresses this limitation by enabling an SDM to estimate elapsed speaking…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
