TiCo: Time-Controllable Spoken Dialogue Model

Kai-Wei Chang; Wei-Chih Chen; En-Pei Hu; Hung-yi Lee; James Glass

arXiv:2603.22267·cs.CL·May 14, 2026

TiCo: Time-Controllable Spoken Dialogue Model

Kai-Wei Chang, Wei-Chih Chen, En-Pei Hu, Hung-yi Lee, James Glass

PDF

1 Datasets

TL;DR

TiCo is a novel time-controllable spoken dialogue model that can generate responses with specified durations, improving interaction quality in voice systems by estimating and adjusting speaking time.

Contribution

The paper introduces TiCo, the first time-aware SDM with a benchmark, enabling duration control through Spoken Time Markers and efficient post-training without paired data.

Findings

01

TiCo reduces duration error by 2.7x compared to its backbone.

02

TiCo outperforms baselines in maintaining target response durations.

03

TiCo preserves response quality while controlling speaking time.

Abstract

We introduce TiCo, a time-controllable spoken dialogue model (SDM) that follows time-constrained instructions (e.g., "Please generate a response lasting about 15 seconds") and generates spoken responses with controllable duration. This capability is valuable for real-world spoken language systems such as voice assistants and interactive agents, where controlling response duration can improve interaction quality. However, despite their strong ability to generate natural spoken responses, existing models lack time awareness and struggle to follow duration-related instructions. To systematically evaluate this, we introduce TiCo-Bench, the first benchmark for time-controllable instruction following in SDMs, on which existing open-source and commercial models frequently fail to satisfy explicit time constraints. TiCo addresses this limitation by enabling an SDM to estimate elapsed speaking…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

WeiChihChen/TiCo-Bench
dataset· 836 dl
836 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.