Weakly supervised spoken term discovery using cross-lingual side   information

Sameer Bansal; Herman Kamper; Sharon Goldwater; Adam Lopez

arXiv:1609.06530·cs.CL·September 22, 2016

Weakly supervised spoken term discovery using cross-lingual side information

Sameer Bansal, Herman Kamper, Sharon Goldwater, Adam Lopez

PDF

TL;DR

This paper introduces a rescoring method for unsupervised spoken term discovery that leverages cross-lingual text translations to improve accuracy, demonstrated on Spanish audio with English translations.

Contribution

It presents a novel rescoring approach using noisy text translations to enhance unsupervised spoken term discovery performance.

Findings

01

Significant improvement in average precision across various configurations

02

Effective use of noisy cross-lingual translations as side information

03

Applicable to low-resource languages with available translations

Abstract

Recent work on unsupervised term discovery (UTD) aims to identify and cluster repeated word-like units from audio alone. These systems are promising for some very low-resource languages where transcribed audio is unavailable, or where no written form of the language exists. However, in some cases it may still be feasible (e.g., through crowdsourcing) to obtain (possibly noisy) text translations of the audio. If so, this information could be used as a source of side information to improve UTD. Here, we present a simple method for rescoring the output of a UTD system using text translations, and test it on a corpus of Spanish audio with English translations. We show that it greatly improves the average precision of the results over a wide range of system configurations and data preprocessing methods.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.