TL;DR
This paper investigates how incorporating translations into high-resource languages can enhance speech transcription accuracy in extremely low-resource settings, using a neural multi-source model evaluated on three datasets.
Contribution
It introduces a neural multi-source model with shared attention that leverages translations to improve low-resource speech transcription.
Findings
Multi-source model reduces character error rate by up to 12.3%.
Shared attention mechanism outperforms baseline models.
Effective in three low-resource language datasets.
Abstract
Recently proposed data collection frameworks for endangered language documentation aim not only to collect speech in the language of interest, but also to collect translations into a high-resource language that will render the collected resource interpretable. We focus on this scenario and explore whether we can improve transcription quality under these extremely low-resource settings with the assistance of text translations. We present a neural multi-source model and evaluate several variations of it on three low-resource datasets. We find that our multi-source model with shared attention outperforms the baselines, reducing transcription character error rate by up to 12.3%.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
