RESPOND: Responsive Engagement Strategy for Predictive Orchestration and Dialogue
Meng-Chen Lee, Costas Panay, Javier Hernandez, Sean Andrist, Dan Bohus, Anatoly Churikov, Andrew D. Wilson

TL;DR
RESPOND is a framework that enhances voice-based agents with timely backchannels and proactive turn claims, enabling more natural, fluid, and socially adaptable dialogue through predictive orchestration and controllability.
Contribution
It introduces a novel predictive system for turn-taking and engagement in conversational agents, with adjustable parameters for social appropriateness.
Findings
Enables fluid, listener-aware dialogue with continuous prediction.
Provides designer-facing controls for engagement style.
Improves naturalness and social adaptability of voice agents.
Abstract
The majority of voice-based conversational agents still rely on pause-and-respond turn-taking, leaving interactions sounding stiff and robotic. We present RESPOND (Responsive Engagement Strategy for Predictive Orchestration and Dialogue), a framework that brings two staples of human conversation to agents: timely backchannels ("mm-hmm," "right") and proactive turn claims that can contribute relevant content before the speaker yields the conversational floor. Built on streaming ASR (Automatic Speech Recognition) and incremental semantics, RESPOND continuously predicts both when and how to interject, enabling fluid, listener-aware dialogue. A defining feature is its designer-facing controllability: two orthogonal dials, Backchannel Intensity (frequency of acknowledgments) and Turn Claim Aggressiveness (depth and assertiveness of early contributions), can be tuned to match the etiquette of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
