An Efficient Self-Learning Framework For Interactive Spoken Dialog   Systems

Hitesh Tulsiani; David M. Chan; Shalini Ghosh; Garima Lalwani; Prabhat; Pandey; Ankish Bansal; Sri Garimella; Ariya Rastrow; Bj\"orn Hoffmeister

arXiv:2409.10515·eess.AS·September 17, 2024

An Efficient Self-Learning Framework For Interactive Spoken Dialog Systems

Hitesh Tulsiani, David M. Chan, Shalini Ghosh, Garima Lalwani, Prabhat, Pandey, Ankish Bansal, Sri Garimella, Ariya Rastrow, Bj\"orn Hoffmeister

PDF

Open Access

TL;DR

This paper presents a novel self-learning framework for dialog system ASR that adapts over time using user feedback and context, significantly reducing word error rates in real-world and synthetic datasets.

Contribution

The work introduces a general, context-aware self-learning framework leveraging student-teacher models and contrastive self-supervision for improved dialog ASR.

Findings

01

Near 10% relative WER reduction in real-world systems

02

Up to 26% WER reduction on synthetic data

03

Effective adaptation to multi-turn conversations

Abstract

Dialog systems, such as voice assistants, are expected to engage with users in complex, evolving conversations. Unfortunately, traditional automatic speech recognition (ASR) systems deployed in such applications are usually trained to recognize each turn independently and lack the ability to adapt to the conversational context or incorporate user feedback. In this work, we introduce a general framework for ASR in dialog systems that can go beyond learning from single-turn utterances and learn over time how to adapt to both explicit supervision and implicit user feedback present in multi-turn conversations. We accomplish that by leveraging advances in student-teacher learning and context-aware dialog processing, and designing contrastive self-supervision approaches with Ohm, a new online hard-negative mining approach. We show that leveraging our new framework compared to traditional…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems · Multimodal Machine Learning Applications · Multi-Agent Systems and Negotiation