Automatic Speech Recognition for Speech Assessment of Persian Preschool Children
Amirhossein Abaskohi, Fatemeh Mortazavi, Hadi Moradi

TL;DR
This paper introduces a modified Wav2Vec 2.0 model with a new Random Frequency Pitch objective to improve speech recognition accuracy for Persian preschool children, especially in online assessments and low-data scenarios.
Contribution
The study proposes a novel RFP objective and fine-tunes the model on a new dataset for Persian speech, achieving superior WER and zero-shot performance.
Findings
Achieved a WER of 1.35 with masking and RFP on their dataset.
Reached a WER of 6.45 on Persian CommonVoice dataset.
Demonstrated effectiveness in zero- and few-shot learning scenarios.
Abstract
Preschool evaluation is crucial because it gives teachers and parents influential knowledge about children's growth and development. The COVID-19 pandemic has highlighted the necessity of online assessment for preschool children. One of the areas that should be tested is their ability to speak. Employing an Automatic Speech Recognition (ASR) system would not help since they are pre-trained on voices that differ from children's in terms of frequency and amplitude. Because most of these are pre-trained with data in a specific range of amplitude, their objectives do not make them ready for voices in different amplitudes. To overcome this issue, we added a new objective to the masking objective of the Wav2Vec 2.0 model called Random Frequency Pitch (RFP). In addition, we used our newly introduced dataset to fine-tune our model for Meaningless Words (MW) and Rapid Automatic Naming (RAN)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Infant Health and Development
Methods1x1 Convolution · Sigmoid Activation · Recursive Feature Pyramid
