Golos: Russian Dataset for Speech Research
Nikolay Karpov, Alexander Denisenko, Fedor Minkin

TL;DR
This paper presents Golos, a large, freely available Russian speech dataset with 1240 hours of annotated audio, along with an acoustic model and transfer learning techniques, achieving low word error rates for speech recognition.
Contribution
Introduction of Golos, a comprehensive Russian speech dataset, and development of an acoustic model with transfer learning for improved speech recognition performance.
Findings
Golos dataset contains approximately 1240 hours of annotated speech.
Achieved a word error rate of about 3.3% with the acoustic model.
Transfer learning improved the model's accuracy on the dataset.
Abstract
This paper introduces a novel Russian speech dataset called Golos, a large corpus suitable for speech research. The dataset mainly consists of recorded audio files manually annotated on the crowd-sourcing platform. The total duration of the audio is about 1240 hours. We have made the corpus freely available to download, along with the acoustic model with CTC loss prepared on this corpus. Additionally, transfer learning was applied to improve the performance of the acoustic model. In order to evaluate the quality of the dataset with the beam-search algorithm, we have built a 3-gram language model on the open Common Crawl dataset. The total word error rate (WER) metrics turned out to be about 3.3% and 11.5%.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsConnectionist Temporal Classification Loss
