Gram-CTC: Automatic Unit Selection and Target Decomposition for Sequence Labelling
Hairong Liu, Zhenyao Zhu, Xiangang Li, Sanjeev Satheesh

TL;DR
Gram-CTC introduces an adaptive loss function for sequence labeling that automatically learns optimal units and decompositions, enhancing performance and efficiency in speech recognition tasks.
Contribution
It extends the CTC loss to automatically select basic units and sequence decompositions, overcoming fixed unit limitations.
Findings
Gram-CTC outperforms CTC in speech recognition accuracy.
It improves computational efficiency over traditional CTC.
Achieves state-of-the-art results on standard benchmarks.
Abstract
Most existing sequence labelling models rely on a fixed decomposition of a target sequence into a sequence of basic units. These methods suffer from two major drawbacks: 1) the set of basic units is fixed, such as the set of words, characters or phonemes in speech recognition, and 2) the decomposition of target sequences is fixed. These drawbacks usually result in sub-optimal performance of modeling sequences. In this pa- per, we extend the popular CTC loss criterion to alleviate these limitations, and propose a new loss function called Gram-CTC. While preserving the advantages of CTC, Gram-CTC automatically learns the best set of basic units (grams), as well as the most suitable decomposition of tar- get sequences. Unlike CTC, Gram-CTC allows the model to output variable number of characters at each time step, which enables the model to capture longer term dependency and improves the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Music and Audio Processing
