A higher order Minkowski loss for improved prediction ability of acoustic model in ASR
Vishwanath Pratap Singh, Shakti P. Rath, Abhishek Pandey

TL;DR
This paper introduces higher order Minkowski loss during inference in ASR systems, leveraging higher order statistics to improve prediction accuracy without altering training procedures.
Contribution
It demonstrates that higher order Minkowski loss enhances acoustic model predictions by utilizing higher order statistics, easily integrated into existing ASR systems during inference.
Findings
Higher order Minkowski loss reduces word error rate on LibriSpeech datasets.
The method improves prediction accuracy across multiple baseline models.
No changes are needed in the training pipeline, only inference modifications.
Abstract
Conventional automatic speech recognition (ASR) system uses second-order minkowski loss during inference time which is suboptimal as it incorporates only first order statistics in posterior estimation [2]. In this paper we have proposed higher order minkowski loss (4th Order and 6th Order) during inference time, without any changes during training time. The main contribution of the paper is to show that higher order loss uses higher order statistics in posterior estimation, which improves the prediction ability of acoustic model in ASR system. We have shown mathematically that posterior probability obtained due to higher order loss is function of second order posterior and thus the method can be incorporated in standard ASR system in an easy manner. It is to be noted that all changes are proposed during test(inference) time, we do not make any change in any training pipeline. Multiple…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
MethodsTanh Activation · Sigmoid Activation · Long Short-Term Memory
