Towards Improving the Performance of Pre-Trained Speech Models for Low-Resource Languages Through Lateral Inhibition
Andrei-Marius Avram, R\u{a}zvan-Alexandru Sm\u{a}du, Vasile, P\u{a}i\c{s}, Dumitru-Clementin Cercel, Radu Ion, and Dan Tufi\c{s}

TL;DR
This paper enhances pre-trained speech models for low-resource languages by integrating a biologically inspired lateral inhibition layer, resulting in significant WER improvements and state-of-the-art performance on Romanian speech datasets.
Contribution
It introduces a simple yet effective lateral inhibition layer to improve fine-tuning of speech models for low-resource languages.
Findings
Average 12.5% WER reduction on Romanian speech tasks
State-of-the-art WER of 1.78% on Romanian Speech Corpus
State-of-the-art WER of 29.64% on Robin Corpus
Abstract
With the rise of bidirectional encoder representations from Transformer models in natural language processing, the speech community has adopted some of their development methodologies. Therefore, the Wav2Vec models were introduced to reduce the data required to obtain state-of-the-art results. This work leverages this knowledge and improves the performance of the pre-trained speech models by simply replacing the fine-tuning dense layer with a lateral inhibition layer inspired by the biological process. Our experiments on Romanian, a low-resource language, show an average improvement of 12.5% word error rate (WER) using the lateral inhibition layer. In addition, we obtain state-of-the-art results on both the Romanian Speech Corpus and the Robin Technical Acquisition Corpus with 1.78% WER and 29.64% WER, respectively.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Topic Modeling · Natural Language Processing Techniques
MethodsMulti-Head Attention · Attention Is All You Need · Layer Normalization · Absolute Position Encodings · Byte Pair Encoding · Linear Layer · Label Smoothing · Adam · Position-Wise Feed-Forward Layer · Residual Connection
