TL;DR
This paper introduces SBPN, a multilingual ASR model for Nigerian languages that uses knowledge distillation and self-improvement to significantly reduce error rates and outperform existing models.
Contribution
The paper presents a novel two-stage distillation framework and releases SBPN, a foundational multilingual ASR model for Nigerian languages, enhancing resource availability.
Findings
Achieved 29% relative WER reduction over monolingual baselines.
Outperformed state-of-the-art multilingual models on major benchmarks.
Released SBPN models in two sizes to support further research.
Abstract
Although modern multilingual Automatic Speech Recognition (ASR) systems support several Nigerian languages, their performance consistently lags behind high-resource languages like English and French. Nigerian languages present unique modelling hurdles, including acute data scarcity, inconsistent orthography, tonal diacritics, diverse accents, frequent code-switching, and localized named entities. To address these challenges, we developed a multilingual ASR framework utilizing a two-stage distillation process. First, we employ student-teacher knowledge distillation from existing monolingual models, conditioned on robust language-specific N-gram language models. Second, we perform iterative self improvement using pseudo-labelled data to further refine accuracy. Our method significantly bridges the performance gap, achieving on average a relative Word Error Rate (WER) reduction of 29 %…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
