Less Peaky and More Accurate CTC Forced Alignment by Label Priors
Ruizhe Huang, Xiaohui Zhang, Zhaoheng Ni, Li Sun, Moto Hira, Jeff, Hwang, Vimal Manohar, Vineel Pratap, Matthew Wiesner, Shinji Watanabe, Daniel, Povey, Sanjeev Khudanpur

TL;DR
This paper introduces a modified CTC model that reduces peaky output distributions by leveraging label priors, leading to more accurate forced alignments at phoneme and word levels, with improved efficiency and comparable performance to existing tools.
Contribution
It proposes a novel approach to mitigate CTC peaky behavior using label priors, enhancing alignment accuracy and training simplicity.
Findings
Reduces phoneme and word boundary errors by 12-40%.
Produces less peaky posteriors and more accurate token offsets.
Offers a simpler, more efficient training pipeline.
Abstract
Connectionist temporal classification (CTC) models are known to have peaky output distributions. Such behavior is not a problem for automatic speech recognition (ASR), but it can cause inaccurate forced alignments (FA), especially at finer granularity, e.g., phoneme level. This paper aims at alleviating the peaky behavior for CTC and improve its suitability for forced alignment generation, by leveraging label priors, so that the scores of alignment paths containing fewer blanks are boosted and maximized during training. As a result, our CTC model produces less peaky posteriors and is able to more accurately predict the offset of the tokens besides their onset. It outperforms the standard CTC model and a heuristics-based approach for obtaining CTC's token offset timestamps by 12-40% in phoneme and word boundary errors (PBE and WBE) measured on the Buckeye and TIMIT data. Compared with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Advanced Numerical Analysis Techniques
MethodsFeedback Alignment
