CTC Alignments Improve Autoregressive Translation

Brian Yan; Siddharth Dalmia; Yosuke Higuchi; Graham Neubig; Florian; Metze; Alan W Black; Shinji Watanabe

arXiv:2210.05200·cs.CL·October 12, 2022·1 cites

CTC Alignments Improve Autoregressive Translation

Brian Yan, Siddharth Dalmia, Yosuke Higuchi, Graham Neubig, Florian, Metze, Alan W Black, Shinji Watanabe

PDF

Open Access

TL;DR

This paper demonstrates that integrating CTC with attention mechanisms in a joint framework enhances translation quality in speech and text translation tasks, outperforming pure-attention models.

Contribution

The paper introduces a joint CTC/attention model for translation, adapting ASR techniques to improve translation performance across multiple benchmarks.

Findings

01

Joint CTC/attention models outperform pure-attention baselines.

02

CTC helps mitigate weaknesses of attention models during training and decoding.

03

Model improvements are validated on six benchmark translation tasks.

Abstract

Connectionist Temporal Classification (CTC) is a widely used approach for automatic speech recognition (ASR) that performs conditionally independent monotonic alignment. However for translation, CTC exhibits clear limitations due to the contextual and non-monotonic nature of the task and thus lags behind attentional decoder approaches in terms of translation quality. In this work, we argue that CTC does in fact make sense for translation if applied in a joint CTC/attention framework wherein CTC's core properties can counteract several key weaknesses of pure-attention models during training and decoding. To validate this conjecture, we modify the Hybrid CTC/Attention model originally proposed for ASR to support text-to-text translation (MT) and speech-to-text translation (ST). Our proposed joint CTC/attention models outperform pure-attention baselines across six benchmark translation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Music and Audio Processing