ON-TRAC Consortium Systems for the IWSLT 2022 Dialect and Low-resource   Speech Translation Tasks

Marcely Zanon Boito; John Ortega; Hugo Riguidel; Antoine Laurent,; Lo\"ic Barrault; Fethi Bougares; Firas Chaabani; Ha Nguyen; Florentin; Barbier; Souhir Gahbiche; Yannick Est\`eve

arXiv:2205.01987·cs.CL·May 5, 2022

ON-TRAC Consortium Systems for the IWSLT 2022 Dialect and Low-resource Speech Translation Tasks

Marcely Zanon Boito, John Ortega, Hugo Riguidel, Antoine Laurent,, Lo\"ic Barrault, Fethi Bougares, Firas Chaabani, Ha Nguyen, Florentin, Barbier, Souhir Gahbiche, Yannick Est\`eve

PDF

Open Access

TL;DR

This paper presents the ON-TRAC Consortium systems for IWSLT 2022, comparing end-to-end and pipeline speech translation models for low-resource and dialect tasks, highlighting transfer learning and phonetic transcriptions' effectiveness.

Contribution

It introduces novel speech translation systems for dialect and low-resource languages, demonstrating the benefits of transfer learning and phonetic transcriptions over traditional models.

Findings

01

Pipeline approaches outperform end-to-end models with transfer learning.

02

Self-supervised models trained on smaller data sets are more effective.

03

Approximate phonetic transcriptions can improve speech translation scores.

Abstract

This paper describes the ON-TRAC Consortium translation systems developed for two challenge tracks featured in the Evaluation Campaign of IWSLT 2022: low-resource and dialect speech translation. For the Tunisian Arabic-English dataset (low-resource and dialect tracks), we build an end-to-end model as our joint primary submission, and compare it against cascaded models that leverage a large fine-tuned wav2vec 2.0 model for ASR. Our results show that in our settings pipeline approaches are still very competitive, and that with the use of transfer learning, they can outperform end-to-end models for speech translation (ST). For the Tamasheq-French dataset (low-resource track) our primary submission leverages intermediate representations from a wav2vec 2.0 model trained on 234 hours of Tamasheq audio, while our contrastive model uses a French phonetic transcription of the Tamasheq audio as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Topic Modeling