Towards Improving the Performance of Pre-Trained Speech Models for   Low-Resource Languages Through Lateral Inhibition

Andrei-Marius Avram; R\u{a}zvan-Alexandru Sm\u{a}du; Vasile; P\u{a}i\c{s}; Dumitru-Clementin Cercel; Radu Ion; and Dan Tufi\c{s}

arXiv:2306.17792·cs.CL·July 3, 2023

Towards Improving the Performance of Pre-Trained Speech Models for Low-Resource Languages Through Lateral Inhibition

Andrei-Marius Avram, R\u{a}zvan-Alexandru Sm\u{a}du, Vasile, P\u{a}i\c{s}, Dumitru-Clementin Cercel, Radu Ion, and Dan Tufi\c{s}

PDF

Open Access

TL;DR

This paper enhances pre-trained speech models for low-resource languages by integrating a biologically inspired lateral inhibition layer, resulting in significant WER improvements and state-of-the-art performance on Romanian speech datasets.

Contribution

It introduces a simple yet effective lateral inhibition layer to improve fine-tuning of speech models for low-resource languages.

Findings

01

Average 12.5% WER reduction on Romanian speech tasks

02

State-of-the-art WER of 1.78% on Romanian Speech Corpus

03

State-of-the-art WER of 29.64% on Robin Corpus

Abstract

With the rise of bidirectional encoder representations from Transformer models in natural language processing, the speech community has adopted some of their development methodologies. Therefore, the Wav2Vec models were introduced to reduce the data required to obtain state-of-the-art results. This work leverages this knowledge and improves the performance of the pre-trained speech models by simply replacing the fine-tuning dense layer with a lateral inhibition layer inspired by the biological process. Our experiments on Romanian, a low-resource language, show an average improvement of 12.5% word error rate (WER) using the lateral inhibition layer. In addition, we obtain state-of-the-art results on both the Romanian Speech Corpus and the Robin Technical Acquisition Corpus with 1.78% WER and 29.64% WER, respectively.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Topic Modeling · Natural Language Processing Techniques

MethodsMulti-Head Attention · Attention Is All You Need · Layer Normalization · Absolute Position Encodings · Byte Pair Encoding · Linear Layer · Label Smoothing · Adam · Position-Wise Feed-Forward Layer · Residual Connection