Subword and Crossword Units for CTC Acoustic Models
Thomas Zenkel, Ramon Sanabria, Florian Metze, Alex Waibel

TL;DR
This paper introduces a method for creating flexible subword and crossword units for CTC speech recognition, improving performance by balancing unit set size and training data, and achieving state-of-the-art results.
Contribution
It presents a novel unit set creation method using Byte Pair Encoding for CTC models, combining crossword and subword units to enhance recognition accuracy.
Findings
Achieved state-of-the-art results with grapheme-based CTC systems.
Demonstrated effective trade-offs between unit set size and training data.
Improved recognition performance by combining units with language model decoding.
Abstract
This paper proposes a novel approach to create an unit set for CTC based speech recognition systems. By using Byte Pair Encoding we learn an unit set of an arbitrary size on a given training text. In contrast to using characters or words as units this allows us to find a good trade-off between the size of our unit set and the available training data. We evaluate both Crossword units, that may span multiple word, and Subword units. By combining this approach with decoding methods using a separate language model we are able to achieve state of the art results for grapheme based CTC systems.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
