AMPS: ASR with Multimodal Paraphrase Supervision

Abhishek Gupta; Amruta Parulekar; Sameep Chattopadhyay; Preethi Jyothi

arXiv:2411.18368·cs.CL·April 18, 2025

AMPS: ASR with Multimodal Paraphrase Supervision

Abhishek Gupta, Amruta Parulekar, Sameep Chattopadhyay, Preethi Jyothi

PDF

Open Access 1 Video

TL;DR

This paper introduces AMPS, a novel multimodal ASR training technique that leverages paraphrase supervision to enhance conversational speech recognition across multiple languages, achieving notable WER reductions.

Contribution

AMPS is the first to incorporate paraphrase-based supervision into a multilingual multimodal ASR system for improved conversational speech recognition.

Findings

01

Up to 5% relative WER reduction across languages.

02

Effective use of paraphrases improves recognition accuracy.

03

Detailed evaluation confirms system robustness.

Abstract

Spontaneous or conversational multilingual speech presents many challenges for state-of-the-art automatic speech recognition (ASR) systems. In this work, we present a new technique AMPS that augments a multilingual multimodal ASR system with paraphrase-based supervision for improved conversational ASR in multiple languages, including Hindi, Marathi, Malayalam, Kannada, and Nyanja. We use paraphrases of the reference transcriptions as additional supervision while training the multimodal ASR model and selectively invoke this paraphrase objective for utterances with poor ASR performance. Using AMPS with a state-of-the-art multimodal model SeamlessM4T, we obtain significant relative reductions in word error rates (WERs) of up to 5%. We present detailed analyses of our system using both objective and human evaluation metrics.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

AMPS: ASR with Multimodal Paraphrase Supervision· underline

Taxonomy

TopicsSpeech and dialogue systems