A Reference-less Quality Metric for Automatic Speech Recognition via   Contrastive-Learning of a Multi-Language Model with Self-Supervision

Kamer Ali Yuksel; Thiago Ferreira; Ahmet Gunduz; Mohamed; Al-Badrashiny; Golara Javadi

arXiv:2306.13114·cs.CL·June 26, 2023·1 cites

A Reference-less Quality Metric for Automatic Speech Recognition via Contrastive-Learning of a Multi-Language Model with Self-Supervision

Kamer Ali Yuksel, Thiago Ferreira, Ahmet Gunduz, Mohamed, Al-Badrashiny, Golara Javadi

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel multi-language referenceless quality metric for ASR systems, leveraging contrastive learning with a pre-trained language model, which correlates better with WER and improves hypothesis ensembling.

Contribution

It presents a new self-supervised contrastive learning approach to fine-tune a multilingual language model for reference-less ASR quality assessment, outperforming existing perplexity-based metrics.

Findings

01

Higher correlation with WER than perplexity metrics

02

Reduces WER by over 7% in hypothesis ensembling

03

Effective across multiple languages and unseen datasets

Abstract

The common standard for quality evaluation of automatic speech recognition (ASR) systems is reference-based metrics such as the Word Error Rate (WER), computed using manual ground-truth transcriptions that are time-consuming and expensive to obtain. This work proposes a multi-language referenceless quality metric, which allows comparing the performance of different ASR models on a speech dataset without ground truth transcriptions. To estimate the quality of ASR hypotheses, a pre-trained language model (LM) is fine-tuned with contrastive learning in a self-supervised learning manner. In experiments conducted on several unseen test datasets consisting of outputs from top commercial ASR engines in various languages, the proposed referenceless metric obtains a much higher correlation with WER scores and their ranks than the perplexity metric from the state-of-art multi-lingual LM in all…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aixplain/NoRefER
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Music and Audio Processing

MethodsContrastive Learning