A Large Dataset of Spontaneous Speech with the Accent Spoken in S\~ao Paulo for Automatic Speech Recognition Evaluation
Rodrigo Lima, Sidney Evaldo Leal, Arnaldo Candido Junior, Sandra Maria, Alu\'isio

TL;DR
This paper introduces a large, freely available spontaneous speech corpus of São Paulo Portuguese, and evaluates automatic speech recognition models trained on it, demonstrating promising results for ASR tasks.
Contribution
The paper presents the first large spontaneous speech corpus of São Paulo Portuguese for ASR, along with baseline experiments and fine-tuned models, facilitating future research.
Findings
Distil-Whisper achieved a WER of 24.22% on the corpus.
Wav2Vec2-XLSR-53 achieved a WER of 33.73%.
The corpus and models are publicly available for reproducibility.
Abstract
We present a freely available spontaneous speech corpus for the Brazilian Portuguese language and report preliminary automatic speech recognition (ASR) results, using both the Wav2Vec2-XLSR-53 and Distil-Whisper models fine-tuned and trained on our corpus. The NURC-SP Audio Corpus comprises 401 different speakers (204 females, 197 males) with a total of 239.30 hours of transcribed audio recordings. To the best of our knowledge, this is the first large Paulistano accented spontaneous speech corpus dedicated to the ASR task in Portuguese. We first present the design and development procedures of the NURC-SP Audio Corpus, and then describe four ASR experiments in detail. The experiments demonstrated promising results for the applicability of the corpus for ASR. Specifically, we fine-tuned two versions of Wav2Vec2-XLSR-53 model, trained a Distil-Whisper model using our dataset with labels…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis
