AVATAR: Robust Voice Search Engine Leveraging Autoregressive Document   Retrieval and Contrastive Learning

Yi-Cheng Wang; Tzu-Ting Yang; Hsin-Wei Wang; Bi-Cheng Yan; Berlin Chen

arXiv:2309.01395·cs.IR·September 6, 2023

AVATAR: Robust Voice Search Engine Leveraging Autoregressive Document Retrieval and Contrastive Learning

Yi-Cheng Wang, Tzu-Ting Yang, Hsin-Wei Wang, Bi-Cheng Yan, Berlin Chen

PDF

Open Access

TL;DR

This paper introduces AVATAR, a robust voice search system that uses autoregressive document retrieval and contrastive learning to mitigate ASR errors, improving performance and robustness on open-domain question answering tasks.

Contribution

It proposes a novel voice search approach combining autoregressive retrieval and contrastive learning to handle ASR noise effectively, which is a new solution in this domain.

Findings

01

Enhanced robustness against ASR errors demonstrated in experiments

02

Significant performance improvements over baseline systems

03

Effective noise modeling through data augmentation and contrastive learning

Abstract

Voice, as input, has progressively become popular on mobiles and seems to transcend almost entirely text input. Through voice, the voice search (VS) system can provide a more natural way to meet user's information needs. However, errors from the automatic speech recognition (ASR) system can be catastrophic to the VS system. Building on the recent advanced lightweight autoregressive retrieval model, which has the potential to be deployed on mobiles, leading to a more secure and personal VS assistant. This paper presents a novel study of VS leveraging autoregressive retrieval and tackles the crucial problems facing VS, viz. the performance drop caused by ASR noise, via data augmentations and contrastive learning, showing how explicit and implicit modeling the noise patterns can alleviate the problems. A series of experiments conducted on the Open-Domain Question Answering (ODSQA) confirm…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Topic Modeling