Once-for-All Sequence Compression for Self-Supervised Speech Models
Hsuan-Jui Chen, Yen Meng, Hung-yi Lee

TL;DR
This paper introduces a flexible sequence compression framework for self-supervised speech models that supports a continuous range of compression rates, enabling task-specific adaptation with minimal performance loss.
Contribution
It proposes a once-for-all compression framework allowing continuous rate adjustment and adaptive learning for speech models, improving efficiency and task-specific performance.
Findings
Supports a continuous range of compression rates
Achieves minimal performance degradation
Enables adaptive, task-specific rate selection
Abstract
The sequence length along the time axis is often the dominant factor of the computation in speech processing. Works have been proposed to reduce the sequence length for lowering the computational cost in self-supervised speech models. However, different downstream tasks have different tolerance of sequence compressing, so a model that produces a fixed compressing rate may not fit all tasks. In this work, we introduce a once-for-all (OFA) sequence compression framework for self-supervised speech models that supports a continuous range of operating compressing rates. The framework is evaluated on various tasks, showing marginal degradation compared to the fixed compressing rate variants with a smooth performance-efficiency trade-off. We further explore adaptive compressing rate learning, demonstrating the ability to select task-specific preferred frame periods without needing a grid…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Topic Modeling · Speech and dialogue systems
