Efficient CTC Regularization via Coarse Labels for End-to-End Speech Translation
Biao Zhang, Barry Haddow, Rico Sennrich

TL;DR
This paper introduces CoLaCTC, a method that simplifies CTC label space for end-to-end speech translation, significantly improving training efficiency while maintaining or enhancing translation quality.
Contribution
It proposes coarse labeling strategies for CTC, reducing computational overhead and model complexity without sacrificing translation performance.
Findings
CoLaCTC compresses label space to 256 labels or fewer.
Achieves 1.18x to 1.77x training speedup.
Maintains or improves translation quality across multiple languages.
Abstract
For end-to-end speech translation, regularizing the encoder with the Connectionist Temporal Classification (CTC) objective using the source transcript or target translation as labels can greatly improve quality metrics. However, CTC demands an extra prediction layer over the vocabulary space, bringing in nonnegligible model parameters and computational overheads, although this layer is typically not used for inference. In this paper, we re-examine the need for genuine vocabulary labels for CTC for regularization and explore strategies to reduce the CTC label space, targeting improved efficiency without quality degradation. We propose coarse labeling for CTC (CoLaCTC), which merges vocabulary labels via simple heuristic rules, such as using truncation, division or modulo (MOD) operations. Despite its simplicity, our experiments on 4 source and 8 target languages show that CoLaCTC with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Cancer-related molecular mechanisms research
