ConvFill: Model Collaboration for Responsive Conversational Voice Agents

Vidya Srinivas; Zachary Englhardt; Maximus Powers; Shwetak Patel; Vikram Iyer

arXiv:2511.07397·cs.CL·November 11, 2025

ConvFill: Model Collaboration for Responsive Conversational Voice Agents

Vidya Srinivas, Zachary Englhardt, Maximus Powers, Shwetak Patel, Vikram Iyer

PDF

Open Access

TL;DR

ConvFill introduces a hybrid approach combining on-device and cloud models to create responsive, knowledgeable conversational voice agents with low latency and high accuracy.

Contribution

The paper proposes conversational infill, enabling on-device models to generate contextually appropriate responses while integrating streaming knowledge from backend models, improving responsiveness and knowledge access.

Findings

01

ConvFill achieves 36-42% accuracy improvements over standalone small models.

02

Maintains sub-200ms response latency in evaluations.

03

Effective learning of conversational infill demonstrated across multiple backend models.

Abstract

Deploying conversational voice agents with large language models faces a critical challenge: cloud-based foundation models provide deep reasoning and domain knowledge but introduce latency that disrupts natural conversation, while on-device models respond immediately but lack sophistication. We propose conversational infill, a task where a lightweight on-device model generates contextually appropriate dialogue while seamlessly incorporating streaming knowledge from a powerful backend model. This approach decouples response latency from model capability, enabling systems that feel responsive while accessing the full power of large-scale models. We present ConvFill, a 360M parameter model trained on synthetic multi-domain conversations. Evaluation across multiple backend models shows that conversational infill can be successfully learned, with ConvFill achieving accuracy improvements of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAI in Service Interactions · Topic Modeling · Speech Recognition and Synthesis