A Mixture of Expert Based Deep Neural Network for Improved ASR
Vishwanath Pratap Singh, Shakti P. Rath, Abhishek Pandey

TL;DR
This paper introduces MixNet, a deep neural network architecture with Mixture of Experts layers designed to improve acoustic modeling in ASR by better handling class overlaps, leading to significant WER reductions.
Contribution
The paper proposes a novel MixNet architecture incorporating MoE layers based on phonetic and acoustic classes, enhancing class separation and ASR accuracy.
Findings
Achieved 13.6% relative WER reduction over DNN models.
Achieved 10.0% relative WER reduction over LSTM models.
Significantly outperformed existing phone-classification methods.
Abstract
This paper presents a novel deep learning architecture for acoustic model in the context of Automatic Speech Recognition (ASR), termed as MixNet. Besides the conventional layers, such as fully connected layers in DNN-HMM and memory cells in LSTM-HMM, the model uses two additional layers based on Mixture of Experts (MoE). The first MoE layer operating at the input is based on pre-defined broad phonetic classes and the second layer operating at the penultimate layer is based on automatically learned acoustic classes. In natural speech, overlap in distribution across different acoustic classes is inevitable, which leads to inter-class mis-classification. The ASR accuracy is expected to improve if the conventional architecture of acoustic model is modified to make them more suitable to account for such overlaps. MixNet is developed keeping this in mind. Analysis conducted by means of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing
Methods(FiLe@Against@Claim)How do I file a claim against Expedia? · Average Pooling · 1x1 Convolution · Convolution · Tanh Activation · Global Average Pooling · *Communicated@Fast*How Do I Communicate to Expedia? · Dropout · Batch Normalization · Mixed Depthwise Convolution
