Once-for-All Sequence Compression for Self-Supervised Speech Models

Hsuan-Jui Chen; Yen Meng; Hung-yi Lee

arXiv:2211.02332·cs.CL·May 10, 2023

Once-for-All Sequence Compression for Self-Supervised Speech Models

Hsuan-Jui Chen, Yen Meng, Hung-yi Lee

PDF

Open Access

TL;DR

This paper introduces a flexible sequence compression framework for self-supervised speech models that supports a continuous range of compression rates, enabling task-specific adaptation with minimal performance loss.

Contribution

It proposes a once-for-all compression framework allowing continuous rate adjustment and adaptive learning for speech models, improving efficiency and task-specific performance.

Findings

01

Supports a continuous range of compression rates

02

Achieves minimal performance degradation

03

Enables adaptive, task-specific rate selection

Abstract

The sequence length along the time axis is often the dominant factor of the computation in speech processing. Works have been proposed to reduce the sequence length for lowering the computational cost in self-supervised speech models. However, different downstream tasks have different tolerance of sequence compressing, so a model that produces a fixed compressing rate may not fit all tasks. In this work, we introduce a once-for-all (OFA) sequence compression framework for self-supervised speech models that supports a continuous range of operating compressing rates. The framework is evaluated on various tasks, showing marginal degradation compared to the fixed compressing rate variants with a smooth performance-efficiency trade-off. We further explore adaptive compressing rate learning, demonstrating the ability to select task-specific preferred frame periods without needing a grid…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Topic Modeling · Speech and dialogue systems