CTCBERT: Advancing Hidden-unit BERT with CTC Objectives
Ruchao Fan, Yiming Wang, Yashesh Gaur, and Jinyu Li

TL;DR
This paper introduces CTCBERT, a training method for hidden-unit BERT that uses CTC objectives to improve alignment flexibility and overall speech recognition performance, showing consistent WER improvements on Librispeech.
Contribution
The paper proposes CTCBERT, a novel training approach that replaces CE loss with CTC loss for hidden-unit BERT, enhancing alignment learning and speech recognition accuracy.
Findings
CTCBERT outperforms HuBERT with 2%-11% relative WER reduction.
Using CTC objectives improves alignment flexibility in training.
Slight finetuning benefits observed with blank-related parameters.
Abstract
In this work, we present a simple but effective method, CTCBERT, for advancing hidden-unit BERT (HuBERT). HuBERT applies a frame-level cross-entropy (CE) loss, which is similar to most acoustic model training. However, CTCBERT performs the model training with the Connectionist Temporal Classification (CTC) objective after removing duplicated IDs in each masked region. The idea stems from the observation that there can be significant errors in alignments when using clustered or aligned IDs. CTC learns alignments implicitly, indicating that learning with CTC can be more flexible when misalignment exists. We examine CTCBERT on IDs from HuBERT Iter1, HuBERT Iter2, and PBERT. The CTC training brings consistent improvements compared to the CE training. Furthermore, when loading blank-related parameters during finetuning, slight improvements are observed. Evaluated on the Librispeech 960-100h…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing
MethodsMulti-Head Attention · Attention Is All You Need · Softmax · Adam · Attention Dropout · Refunds@Expedia|||How do I get a full refund from Expedia? · WordPiece · Layer Normalization · Residual Connection · Dropout
