A light-weight and efficient punctuation and word casing prediction   model for on-device streaming ASR

Jian You; Xiangfeng Li

arXiv:2407.13142·cs.CL·July 19, 2024

A light-weight and efficient punctuation and word casing prediction model for on-device streaming ASR

Jian You, Xiangfeng Li

PDF

Open Access

TL;DR

This paper introduces a lightweight CNN-BiLSTM model for real-time punctuation and casing prediction in on-device streaming ASR, achieving high accuracy with minimal size and fast inference.

Contribution

It presents a novel, efficient model that outperforms non-Transformer models and rivals Transformer models in size and speed for on-device ASR punctuation and casing prediction.

Findings

01

9% relative improvement in F1-score over non-Transformer models

02

Model is 40 times smaller than Transformer models

03

Inference is 2.5 times faster than Transformer-based approaches

Abstract

Punctuation and word casing prediction are necessary for automatic speech recognition (ASR). With the popularity of on-device end-to-end streaming ASR systems, the on-device punctuation and word casing prediction become a necessity while we found little discussion on this. With the emergence of Transformer, Transformer based models have been explored for this scenario. However, Transformer based models are too large for on-device ASR systems. In this paper, we propose a light-weight and efficient model that jointly predicts punctuation and word casing in real time. The model is based on Convolutional Neural Network (CNN) and Bidirectional Long Short-Term Memory (BiLSTM). Experimental results on the IWSLT2011 test set show that the proposed model obtains 9% relative improvement compared to the best of non-Transformer models on overall F1-score. Compared to the representative of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsService-Oriented Architecture and Web Services

MethodsSparse Evolutionary Training · Residual Connection · Byte Pair Encoding · Layer Normalization · Label Smoothing · Linear Layer · Adam · Dropout · Multi-Head Attention · Dense Connections