Subword and Crossword Units for CTC Acoustic Models

Thomas Zenkel; Ramon Sanabria; Florian Metze; Alex Waibel

arXiv:1712.06855·cs.CL·June 19, 2018

Subword and Crossword Units for CTC Acoustic Models

Thomas Zenkel, Ramon Sanabria, Florian Metze, Alex Waibel

PDF

TL;DR

This paper introduces a method for creating flexible subword and crossword units for CTC speech recognition, improving performance by balancing unit set size and training data, and achieving state-of-the-art results.

Contribution

It presents a novel unit set creation method using Byte Pair Encoding for CTC models, combining crossword and subword units to enhance recognition accuracy.

Findings

01

Achieved state-of-the-art results with grapheme-based CTC systems.

02

Demonstrated effective trade-offs between unit set size and training data.

03

Improved recognition performance by combining units with language model decoding.

Abstract

This paper proposes a novel approach to create an unit set for CTC based speech recognition systems. By using Byte Pair Encoding we learn an unit set of an arbitrary size on a given training text. In contrast to using characters or words as units this allows us to find a good trade-off between the size of our unit set and the available training data. We evaluate both Crossword units, that may span multiple word, and Subword units. By combining this approach with decoding methods using a separate language model we are able to achieve state of the art results for grapheme based CTC systems.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.