AlloST: Low-resource Speech Translation without Source Transcription

Yao-Fei Cheng; Hung-Shin Lee; and Hsin-Min Wang

arXiv:2105.00171·cs.CL·March 31, 2022

AlloST: Low-resource Speech Translation without Source Transcription

Yao-Fei Cheng, Hung-Shin Lee, and Hsin-Min Wang

PDF

Open Access 1 Repo

TL;DR

This paper introduces a low-resource speech translation framework that leverages a universal phone recognizer and phonetic embeddings, significantly improving translation quality without relying on source transcriptions.

Contribution

It proposes a novel attention-based sequence-to-sequence model utilizing phonetic embeddings and BPE segmentation, advancing low-resource speech translation without source transcription.

Findings

01

Outperforms conformer-based baseline models

02

Achieves performance close to methods using source transcription

03

Effective on Spanish-English and Mandarin dialect corpora

Abstract

The end-to-end architecture has made promising progress in speech translation (ST). However, the ST task is still challenging under low-resource conditions. Most ST models have shown unsatisfactory results, especially in the absence of word information from the source speech utterance. In this study, we survey methods to improve ST performance without using source transcription, and propose a learning framework that utilizes a language-independent universal phone recognizer. The framework is based on an attention-based sequence-to-sequence model, where the encoder generates the phonetic embeddings and phone-aware acoustic representations, and the decoder controls the fusion of the two embedding streams to produce the target token sequence. In addition to investigating different fusion strategies, we explore the specific usage of byte pair encoding (BPE), which compresses a phone…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jamfly/AlloST
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Topic Modeling