Weakly supervised spoken term discovery using cross-lingual side information
Sameer Bansal, Herman Kamper, Sharon Goldwater, Adam Lopez

TL;DR
This paper introduces a rescoring method for unsupervised spoken term discovery that leverages cross-lingual text translations to improve accuracy, demonstrated on Spanish audio with English translations.
Contribution
It presents a novel rescoring approach using noisy text translations to enhance unsupervised spoken term discovery performance.
Findings
Significant improvement in average precision across various configurations
Effective use of noisy cross-lingual translations as side information
Applicable to low-resource languages with available translations
Abstract
Recent work on unsupervised term discovery (UTD) aims to identify and cluster repeated word-like units from audio alone. These systems are promising for some very low-resource languages where transcribed audio is unavailable, or where no written form of the language exists. However, in some cases it may still be feasible (e.g., through crowdsourcing) to obtain (possibly noisy) text translations of the audio. If so, this information could be used as a source of side information to improve UTD. Here, we present a simple method for rescoring the output of a UTD system using text translations, and test it on a corpus of Spanish audio with English translations. We show that it greatly improves the average precision of the results over a wide range of system configurations and data preprocessing methods.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
