Brazilian Portuguese Speech Recognition Using Wav2vec 2.0

Lucas Rafael Stefanel Gris; Edresson Casanova; Frederico Santos de; Oliveira; Anderson da Silva Soares; Arnaldo Candido Junior

arXiv:2107.11414·cs.CL·December 23, 2021·1 cites

Brazilian Portuguese Speech Recognition Using Wav2vec 2.0

Lucas Rafael Stefanel Gris, Edresson Casanova, Frederico Santos de, Oliveira, Anderson da Silva Soares, Arnaldo Candido Junior

PDF

Open Access 1 Repo 1 Models

TL;DR

This paper develops a Brazilian Portuguese speech recognition system using Wav2vec 2.0, achieving the lowest open-end-to-end error rate for BP by fine-tuning a multilingual pre-trained model on open data.

Contribution

It introduces a novel open-source BP ASR system fine-tuned from a multilingual Wav2vec 2.0 model, with state-of-the-art performance among open models.

Findings

01

Average WER of 12.4% across datasets

02

WER reduces to 10.5% with language model

03

Achieves lowest error among open BP ASR models

Abstract

Deep learning techniques have been shown to be efficient in various tasks, especially in the development of speech recognition systems, that is, systems that aim to transcribe an audio sentence in a sequence of written words. Despite the progress in the area, speech recognition can still be considered difficult, especially for languages lacking available data, such as Brazilian Portuguese (BP). In this sense, this work presents the development of an public Automatic Speech Recognition (ASR) system using only open available audio data, from the fine-tuning of the Wav2vec 2.0 XLSR-53 model pre-trained in many languages, over BP data. The final model presents an average word error rate of 12.4% over 7 different datasets (10.5% when applying a language model). According to our knowledge, the obtained error is the lowest among open end-to-end (E2E) ASR models for BP.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lucasgris/wav2vec4bp
pytorchOfficial

Models

🤗
lgris/bp400-xlsr
model· 1 dl· ♡ 3
1 dl♡ 3

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Music and Audio Processing