DNN-Based Semantic Model for Rescoring N-best Speech Recognition List

Dominique Fohr; Irina Illina

arXiv:2011.00975·cs.CL·November 3, 2020

DNN-Based Semantic Model for Rescoring N-best Speech Recognition List

Dominique Fohr, Irina Illina

PDF

Open Access

TL;DR

This paper introduces a DNN-based semantic rescoring method for N-best speech recognition hypotheses, utilizing word embeddings and acoustic features to reduce word error rate under noisy conditions.

Contribution

It proposes a novel DNN model that incorporates semantic and acoustic features for rescoring, improving speech recognition accuracy in noisy environments.

Findings

01

Significant WER reduction in noisy conditions

02

Effective use of word2vec and BERT embeddings

03

Improved performance over baseline models

Abstract

The word error rate (WER) of an automatic speech recognition (ASR) system increases when a mismatch occurs between the training and the testing conditions due to the noise, etc. In this case, the acoustic information can be less reliable. This work aims to improve ASR by modeling long-term semantic relations to compensate for distorted acoustic features. We propose to perform this through rescoring of the ASR N-best hypotheses list. To achieve this, we train a deep neural network (DNN). Our DNN rescoring model is aimed at selecting hypotheses that have better semantic consistency and therefore lower WER. We investigate two types of representations as part of input features to our DNN model: static word embeddings (from word2vec) and dynamic contextual embeddings (from BERT). Acoustic and linguistic features are also included. We perform experiments on the publicly available dataset…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Natural Language Processing Techniques