Leveraging translations for speech transcription in low-resource   settings

Antonis Anastasopoulos; David Chiang

arXiv:1803.08991·cs.CL·June 12, 2018

Leveraging translations for speech transcription in low-resource settings

Antonis Anastasopoulos, David Chiang

PDF

1 Repo

TL;DR

This paper investigates how incorporating translations into high-resource languages can enhance speech transcription accuracy in extremely low-resource settings, using a neural multi-source model evaluated on three datasets.

Contribution

It introduces a neural multi-source model with shared attention that leverages translations to improve low-resource speech transcription.

Findings

01

Multi-source model reduces character error rate by up to 12.3%.

02

Shared attention mechanism outperforms baseline models.

03

Effective in three low-resource language datasets.

Abstract

Recently proposed data collection frameworks for endangered language documentation aim not only to collect speech in the language of interest, but also to collect translations into a high-resource language that will render the collected resource interpretable. We focus on this scenario and explore whether we can improve transcription quality under these extremely low-resource settings with the assistance of text translations. We present a neural multi-source model and evaluate several variations of it on three low-resource datasets. We find that our multi-source model with shared attention outperforms the baselines, reducing transcription character error rate by up to 12.3%.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://bitbucket.org/antonis/dynet-multisource-models
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.