Scenario Aware Speech Recognition: Advancements for Apollo Fearless   Steps & CHiME-4 Corpora

Szu-Jui Chen; Wei Xia; John H.L. Hansen

arXiv:2109.11086·cs.SD·September 24, 2021·1 cites

Scenario Aware Speech Recognition: Advancements for Apollo Fearless Steps & CHiME-4 Corpora

Szu-Jui Chen, Wei Xia, John H.L. Hansen

PDF

Open Access

TL;DR

This paper explores triplet loss for non-semantic speech representation in ASR, demonstrating improved acoustic modeling and WER reductions on CHiME-4 and Fearless Steps datasets, surpassing traditional features like i-Vector.

Contribution

It introduces the use of triplet-loss based embeddings for acoustic modeling in ASR, showing superior performance over i-Vectors and enhancing recognition accuracy on challenging corpora.

Findings

01

Triplet-loss embeddings outperform i-Vectors in acoustic modeling.

02

Achieved up to 11.90% relative WER reduction on real test data.

03

Enhanced ASR performance with additional techniques like multi-style training.

Abstract

In this study, we propose to investigate triplet loss for the purpose of an alternative feature representation for ASR. We consider a general non-semantic speech representation, which is trained with a self-supervised criteria based on triplet loss called TRILL, for acoustic modeling to represent the acoustic characteristics of each audio. This strategy is then applied to the CHiME-4 corpus and CRSS-UTDallas Fearless Steps Corpus, with emphasis on the 100-hour challenge corpus which consists of 5 selected NASA Apollo-11 channels. An analysis of the extracted embeddings provides the foundation needed to characterize training utterances into distinct groups based on acoustic distinguishing properties. Moreover, we also demonstrate that triplet-loss based embedding performs better than i-Vector in acoustic modeling, confirming that the triplet loss is more effective than a speaker feature.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing

MethodsTest · Triplet Loss