TL;DR
This paper presents the first large-scale Sanskrit ASR study, introducing a new 78-hour speech corpus, exploring unit selection impacts, proposing a syllable-inspired modeling unit, and demonstrating improved performance in Sanskrit and other Indic languages.
Contribution
It introduces a comprehensive Sanskrit speech corpus, investigates acoustic and language model units, proposes a novel syllable-inspired unit, and extends findings to Gujarati and Telugu.
Findings
Syllable-inspired units improve ASR accuracy.
Graphemic representations outperform native scripts.
Extended insights benefit other Indic languages.
Abstract
Automatic speech recognition (ASR) in Sanskrit is interesting, owing to the various linguistic peculiarities present in the language. The Sanskrit language is lexically productive, undergoes euphonic assimilation of phones at the word boundaries and exhibits variations in spelling conventions and in pronunciations. In this work, we propose the first large scale study of automatic speech recognition (ASR) in Sanskrit, with an emphasis on the impact of unit selection in Sanskrit ASR. In this work, we release a 78 hour ASR dataset for Sanskrit, which faithfully captures several of the linguistic characteristics expressed by the language. We investigate the role of different acoustic model and language model units in ASR systems for Sanskrit. We also propose a new modelling unit, inspired by the syllable level unit selection, that captures character sequences from one vowel in the word to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
