Exploring RWKV for Memory Efficient and Low Latency Streaming ASR

Keyu An; Shiliang Zhang

arXiv:2309.14758·eess.AS·September 27, 2023·2 cites

Exploring RWKV for Memory Efficient and Low Latency Streaming ASR

Keyu An, Shiliang Zhang

PDF

Open Access

TL;DR

This paper explores the application of RWKV, a linear attention transformer variant, to streaming ASR, demonstrating comparable or better accuracy with reduced latency and memory usage compared to traditional chunk conformer models.

Contribution

The paper introduces RWKV for streaming ASR, combining transformer performance with RNN-like efficiency, suitable for low-latency, memory-constrained environments.

Findings

01

RWKV-Transducer achieves comparable accuracy to chunk conformer transducer.

02

RWKV models demonstrate minimal latency and inference memory cost.

03

Experiments on datasets from 100h to 10000h show scalability and effectiveness.

Abstract

Recently, self-attention-based transformers and conformers have been introduced as alternatives to RNNs for ASR acoustic modeling. Nevertheless, the full-sequence attention mechanism is non-streamable and computationally expensive, thus requiring modifications, such as chunking and caching, for efficient streaming ASR. In this paper, we propose to apply RWKV, a variant of linear attention transformer, to streaming ASR. RWKV combines the superior performance of transformers and the inference efficiency of RNNs, which is well-suited for streaming ASR scenarios where the budget for latency and memory is restricted. Experiments on varying scales (100h - 10000h) demonstrate that RWKV-Transducer and RWKV-Boundary-Aware-Transducer achieve comparable to or even better accuracy compared with chunk conformer transducer, with minimal latency and inference memory cost.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Underwater Acoustics Research