Scenario Aware Speech Recognition: Advancements for Apollo Fearless Steps & CHiME-4 Corpora
Szu-Jui Chen, Wei Xia, John H.L. Hansen

TL;DR
This paper explores triplet loss for non-semantic speech representation in ASR, demonstrating improved acoustic modeling and WER reductions on CHiME-4 and Fearless Steps datasets, surpassing traditional features like i-Vector.
Contribution
It introduces the use of triplet-loss based embeddings for acoustic modeling in ASR, showing superior performance over i-Vectors and enhancing recognition accuracy on challenging corpora.
Findings
Triplet-loss embeddings outperform i-Vectors in acoustic modeling.
Achieved up to 11.90% relative WER reduction on real test data.
Enhanced ASR performance with additional techniques like multi-style training.
Abstract
In this study, we propose to investigate triplet loss for the purpose of an alternative feature representation for ASR. We consider a general non-semantic speech representation, which is trained with a self-supervised criteria based on triplet loss called TRILL, for acoustic modeling to represent the acoustic characteristics of each audio. This strategy is then applied to the CHiME-4 corpus and CRSS-UTDallas Fearless Steps Corpus, with emphasis on the 100-hour challenge corpus which consists of 5 selected NASA Apollo-11 channels. An analysis of the extracted embeddings provides the foundation needed to characterize training utterances into distinct groups based on acoustic distinguishing properties. Moreover, we also demonstrate that triplet-loss based embedding performs better than i-Vector in acoustic modeling, confirming that the triplet loss is more effective than a speaker feature.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
MethodsTest · Triplet Loss
