Experiments on Turkish ASR with Self-Supervised Speech Representation   Learning

Ali Safaya; Engin Erzin

arXiv:2210.07323·cs.CL·December 26, 2022

Experiments on Turkish ASR with Self-Supervised Speech Representation Learning

Ali Safaya, Engin Erzin

PDF

Open Access

TL;DR

This paper explores Turkish automatic speech recognition using self-supervised learning with HUBERT, trained on 6,500 hours of YouTube data, highlighting current limitations in robustness for real-world applications.

Contribution

It presents the first large-scale pre-training of HUBERT for Turkish ASR using extensive online data, analyzing its performance and limitations.

Findings

01

Models lack robustness against real-world disturbances

02

Significant errors identified in accent and noise variations

03

Pre-training improves baseline but needs further enhancement

Abstract

While the Turkish language is listed among low-resource languages, literature on Turkish automatic speech recognition (ASR) is relatively old. In this report, we present our findings on Turkish ASR with speech representation learning using HUBERT. We investigate pre-training HUBERT for Turkish with large-scale data curated from online resources. We pre-train our model using 6,500 hours of speech data from YouTube. The results show that the models are not ready for commercial use since they are not robust against disturbances that typically occur in real-world settings such as variations in accents, slang, background noise and interference. We analyze typical errors and the limitations of the models for use in commercial settings.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and dialogue systems · Natural Language Processing Techniques

MethodsBalanced Selection