Investigating Transcription Normalization in the Faetar ASR Benchmark
Leo Peckham, Michael Ong, Naomi Nagy, Ewan Dunbar

TL;DR
This paper investigates transcription inconsistencies in the Faetar ASR benchmark, finding they are not the main challenge, and explores language modeling and decoding constraints to improve low-resource speech recognition.
Contribution
It provides an analysis of transcription issues in Faetar ASR and evaluates the impact of language modeling and lexicon constraints on performance.
Findings
Transcription inconsistencies are not the primary challenge.
Bigram language models do not improve performance.
Lexicon-constrained decoding can be beneficial.
Abstract
We examine the role of transcription inconsistencies in the Faetar Automatic Speech Recognition benchmark, a challenging low-resource ASR benchmark. With the help of a small, hand-constructed lexicon, we conclude that find that, while inconsistencies do exist in the transcriptions, they are not the main challenge in the task. We also demonstrate that bigram word-based language modelling is of no added benefit, but that constraining decoding to a finite lexicon can be beneficial. The task remains extremely difficult.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRNA and protein synthesis mechanisms · Gene expression and cancer classification · Molecular Biology Techniques and Applications
