CAT: A CTC-CRF based ASR Toolkit Bridging the Hybrid and the End-to-end   Approaches towards Data Efficiency and Low Latency

Keyu An; Hongyu Xiang; Zhijian Ou

arXiv:2005.13326·eess.AS·August 7, 2020·1 cites

CAT: A CTC-CRF based ASR Toolkit Bridging the Hybrid and the End-to-end Approaches towards Data Efficiency and Low Latency

Keyu An, Hongyu Xiang, Zhijian Ou

PDF

Open Access 1 Repo

TL;DR

This paper introduces CAT, an open-source speech recognition toolkit that combines hybrid and end-to-end advantages, achieving state-of-the-art results with data efficiency and low latency, especially for streaming applications.

Contribution

The paper presents a novel CTC-CRF based toolkit that simplifies training, improves data efficiency, and enables low-latency streaming ASR with a new contextualized soft forgetting method.

Findings

01

CAT achieves state-of-the-art results on English and Chinese benchmarks.

02

It performs better than existing non-modularized E2E models on limited datasets.

03

The proposed method enables streaming ASR without accuracy loss.

Abstract

In this paper, we present a new open source toolkit for speech recognition, named CAT (CTC-CRF based ASR Toolkit). CAT inherits the data-efficiency of the hybrid approach and the simplicity of the E2E approach, providing a full-fledged implementation of CTC-CRFs and complete training and testing scripts for a number of English and Chinese benchmarks. Experiments show CAT obtains state-of-the-art results, which are comparable to the fine-tuned hybrid models in Kaldi but with a much simpler training pipeline. Compared to existing non-modularized E2E models, CAT performs better on limited-scale datasets, demonstrating its data efficiency. Furthermore, we propose a new method called contextualized soft forgetting, which enables CAT to do streaming ASR without accuracy degradation. We hope CAT, especially the CTC-CRF based framework and software, will be of broad interest to the community,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

thu-spmi/cat
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Music and Audio Processing