Optimal Completion Distillation for Sequence Learning

Sara Sabour; William Chan; Mohammad Norouzi

arXiv:1810.01398·cs.LG·January 16, 2019·5 cites

Optimal Completion Distillation for Sequence Learning

Sara Sabour, William Chan, Mohammad Norouzi

PDF

Open Access 2 Repos

TL;DR

Optimal Completion Distillation (OCD) is a novel training method for sequence models that directly optimizes edit distance, leading to state-of-the-art speech recognition results without requiring hyper-parameter tuning or pretraining.

Contribution

OCD introduces an efficient, hyper-parameter-free training procedure based on optimal suffixes for sequence-to-sequence models, improving performance on speech recognition tasks.

Findings

01

Achieves 9.3% WER on Wall Street Journal dataset.

02

Achieves 4.5% WER on Librispeech dataset.

03

Outperforms previous methods in sequence learning accuracy.

Abstract

We present Optimal Completion Distillation (OCD), a training procedure for optimizing sequence to sequence models based on edit distance. OCD is efficient, has no hyper-parameters of its own, and does not require pretraining or joint optimization with conditional log-likelihood. Given a partial sequence generated by the model, we first identify the set of optimal suffixes that minimize the total edit distance, using an efficient dynamic programming algorithm. Then, for each position of the generated sequence, we use a target distribution that puts equal probability on the first token of all the optimal suffixes. OCD achieves the state-of-the-art performance on end-to-end speech recognition, on both Wall Street Journal and Librispeech datasets, achieving $9.3%$ WER and $4.5%$ WER respectively.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications