On-device Streaming Discrete Speech Units

Kwanghee Choi; Masao Someki; Emma Strubell; Shinji Watanabe

arXiv:2506.01845·eess.AS·June 3, 2025

On-device Streaming Discrete Speech Units

Kwanghee Choi, Masao Someki, Emma Strubell, Shinji Watanabe

PDF

Open Access 1 Repo

TL;DR

This paper presents a method to make discrete speech units more practical for on-device streaming by reducing model size and attention window, achieving significant efficiency gains with minimal accuracy loss.

Contribution

It introduces a technique to reduce both the attention window and model size of DSUs, enabling real-time on-device speech processing with lower computational costs.

Findings

01

FLOPs reduced by 50%

02

CER increased by only 6.5%

03

Demonstrated effectiveness on ML-SUPERB dataset

Abstract

Discrete speech units (DSUs) are derived from clustering the features of self-supervised speech models (S3Ms). DSUs offer significant advantages for on-device streaming speech applications due to their rich phonetic information, high transmission efficiency, and seamless integration with large language models. However, conventional DSU-based approaches are impractical as they require full-length speech input and computationally expensive S3Ms. In this work, we reduce both the attention window and the model size while preserving the effectiveness of DSUs. Our results demonstrate that we can reduce floating-point operations (FLOPs) by 50% with only a relative increase of 6.5% in character error rate (CER) on the ML-SUPERB 1h dataset. These findings highlight the potential of DSUs for real-time speech processing in resource-constrained environments.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

masao-someki/streamingdsu
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsWireless Communication Networks Research · Multimedia Communication and Technology · IPv6, Mobility, Handover, Networks, Security

MethodsSoftmax · Attention Is All You Need