Duplex Conversation: Towards Human-like Interaction in Spoken Dialogue Systems
Ting-En Lin, Yuchuan Wu, Fei Huang, Luo Si, Jian Sun, Yongbin Li

TL;DR
This paper introduces Duplex Conversation, a multimodal spoken dialogue system that mimics human-like interactions in telephone-based customer service, improving turn-taking and reducing response latency.
Contribution
It presents a novel multimodal dialogue system with semi-supervised learning for better generalization and demonstrates successful deployment in real-world customer service, reducing latency.
Findings
Achieved consistent improvements in user state detection, backchannel selection, and barge-in detection.
Reduced response latency by 50% in live deployment.
Validated effectiveness through online A/B experiments.
Abstract
In this paper, we present Duplex Conversation, a multi-turn, multimodal spoken dialogue system that enables telephone-based agents to interact with customers like a human. We use the concept of full-duplex in telecommunication to demonstrate what a human-like interactive experience should be and how to achieve smooth turn-taking through three subtasks: user state detection, backchannel selection, and barge-in detection. Besides, we propose semi-supervised learning with multimodal data augmentation to leverage unlabeled data to increase model generalization. Experimental results on three sub-tasks show that the proposed method achieves consistent improvements compared with baselines. We deploy the Duplex Conversation to Alibaba intelligent customer service and share lessons learned in production. Online A/B experiments show that the proposed system can significantly reduce response…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Methodstravel james
