Gram-CTC: Automatic Unit Selection and Target Decomposition for Sequence   Labelling

Hairong Liu; Zhenyao Zhu; Xiangang Li; Sanjeev Satheesh

arXiv:1703.00096·cs.CL·August 15, 2017·25 cites

Gram-CTC: Automatic Unit Selection and Target Decomposition for Sequence Labelling

Hairong Liu, Zhenyao Zhu, Xiangang Li, Sanjeev Satheesh

PDF

Open Access

TL;DR

Gram-CTC introduces an adaptive loss function for sequence labeling that automatically learns optimal units and decompositions, enhancing performance and efficiency in speech recognition tasks.

Contribution

It extends the CTC loss to automatically select basic units and sequence decompositions, overcoming fixed unit limitations.

Findings

01

Gram-CTC outperforms CTC in speech recognition accuracy.

02

It improves computational efficiency over traditional CTC.

03

Achieves state-of-the-art results on standard benchmarks.

Abstract

Most existing sequence labelling models rely on a fixed decomposition of a target sequence into a sequence of basic units. These methods suffer from two major drawbacks: 1) the set of basic units is fixed, such as the set of words, characters or phonemes in speech recognition, and 2) the decomposition of target sequences is fixed. These drawbacks usually result in sub-optimal performance of modeling sequences. In this pa- per, we extend the popular CTC loss criterion to alleviate these limitations, and propose a new loss function called Gram-CTC. While preserving the advantages of CTC, Gram-CTC automatically learns the best set of basic units (grams), as well as the most suitable decomposition of tar- get sequences. Unlike CTC, Gram-CTC allows the model to output variable number of characters at each time step, which enables the model to capture longer term dependency and improves the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Music and Audio Processing