CTC Variations Through New WFST Topologies

Aleksandr Laptev; Somshubra Majumdar; Boris Ginsburg

arXiv:2110.03098·eess.AS·September 27, 2022

CTC Variations Through New WFST Topologies

Aleksandr Laptev, Somshubra Majumdar, Boris Ginsburg

PDF

Open Access

TL;DR

This paper introduces three novel WFST topologies for CTC algorithms in speech recognition, reducing graph size and memory use while maintaining or improving accuracy.

Contribution

It proposes three new CTC variants with unique WFST topologies that improve efficiency and accuracy in speech recognition tasks.

Findings

01

Compact-CTC reduces WFST graph size by 1.5x and memory by 2x.

02

Minimal-CTC cuts graph size and memory by 2x and 4x with slight accuracy loss.

03

Selfless-CTC improves accuracy for wide context models.

Abstract

This paper presents novel Weighted Finite-State Transducer (WFST) topologies to implement Connectionist Temporal Classification (CTC)-like algorithms for automatic speech recognition. Three new CTC variants are proposed: (1) the "compact-CTC", in which direct transitions between units are replaced with <epsilon> back-off transitions; (2) the "minimal-CTC", that only adds <blank> self-loops when used in WFST-composition; and (3) the "selfless-CTC" variants, which disallows self-loop for non-blank units. Compact-CTC allows for 1.5 times smaller WFST decoding graphs and reduces memory consumption by two times when training CTC models with the LF-MMI objective without hurting the recognition accuracy. Minimal-CTC reduces graph size and memory consumption by two and four times for the cost of a small accuracy drop. Using selfless-CTC can improve the accuracy for wide context window models.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and dialogue systems · Speech and Audio Processing

Methodsweighted finite state transducer