Align With Purpose: Optimize Desired Properties in CTC Models with a General Plug-and-Play Framework
Eliya Segev, Maya Alroy, Ronen Katsir, Noam Wies, Ayana Shenhav, Yael, Ben-Oren, David Zar, Oren Tadmor, Jacob Bitterman, Amnon Shashua, Tal, Rosenwein

TL;DR
This paper introduces a flexible plug-and-play framework to optimize specific alignment properties in CTC-trained models, demonstrated on large-scale speech recognition with improvements in latency and accuracy.
Contribution
It presents a novel, easy-to-integrate method that enhances CTC models by incorporating additional property-focused loss terms without altering the original CTC loss.
Findings
Improved emission time latency by up to 570ms
Achieved a 4.5% relative WER reduction
Demonstrated scalability to large datasets (up to 280,000 hours)
Abstract
Connectionist Temporal Classification (CTC) is a widely used criterion for training supervised sequence-to-sequence (seq2seq) models. It enables learning the relations between input and output sequences, termed alignments, by marginalizing over perfect alignments (that yield the ground truth), at the expense of imperfect alignments. This binary differentiation of perfect and imperfect alignments falls short of capturing other essential alignment properties that hold significance in other real-world applications. Here we propose , a for enhancing a desired property in models trained with the CTC criterion. We do that by complementing the CTC with an additional loss term that prioritizes alignments according to a desired property. Our method does not require any intervention in the CTC loss function, enables easy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Music and Audio Processing
MethodsConnectionist Temporal Classification Loss
