Domain Specific Wav2vec 2.0 Fine-tuning For The SE&R 2022 Challenge
Alef Iury Siqueira Ferreira, Gustavo dos Reis Oliveira

TL;DR
This paper describes a domain-specific fine-tuning approach for Wav2vec 2.0 to improve Portuguese speech recognition and emotion detection, achieving significant improvements over baseline models in the SE&R 2022 challenge.
Contribution
It introduces a domain-specific fine-tuning method with gain normalization and noise insertion for Wav2vec 2.0, tailored for Portuguese spontaneous and prepared speech.
Findings
Improved performance over baseline in 3 out of 4 tracks
Effective domain adaptation for Portuguese speech recognition
Enhanced robustness with gain normalization and noise techniques
Abstract
This paper presents our efforts to build a robust ASR model for the shared task Automatic Speech Recognition for spontaneous and prepared speech & Speech Emotion Recognition in Portuguese (SE&R 2022). The goal of the challenge is to advance the ASR research for the Portuguese language, considering prepared and spontaneous speech in different dialects. Our method consist on fine-tuning an ASR model in a domain-specific approach, applying gain normalization and selective noise insertion. The proposed method improved over the strong baseline provided on the test set in 3 of the 4 tracks available
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and dialogue systems · Phonetics and Phonology Research
MethodsTest
