DAISY: Data Adaptive Self-Supervised Early Exit for Speech Representation Models
Tzu-Quan Lin, Hung-yi Lee, Hao Tang

TL;DR
DAISY introduces a data-adaptive early exit method for self-supervised speech models, reducing inference time and computational cost by dynamically adjusting exit points based on input noise levels, without requiring additional training.
Contribution
It proposes a novel early exit strategy that relies on self-supervised loss, avoiding extra training or fine-tuning, and achieves performance comparable to HuBERT with faster inference.
Findings
DAISY matches HuBERT's performance on MiniSUPERB.
It exits early on clean data and later on noisy data, adapting to input noise levels.
DAISY significantly reduces inference time without sacrificing accuracy.
Abstract
Self-supervised speech models have shown to be useful for various tasks, but their large size limits the use in devices with low computing power and memory. In this work, we explore early exit, an approach for reducing latency by exiting the forward process of a network early. Most approaches of early exit need a separate early exit model for each task, with some even requiring fine-tuning of the entire pretrained model. We introduce Data Adaptive Self-Supervised Early Exit (DAISY), an approach that decides when to exit based on the self-supervised loss, eliminating the need for multiple round of training and fine-tuning. DAISY matches the performance of HuBERT on the MiniSUPERB benchmark, but with much faster inference times. Our analysis on the adaptivity of DAISY shows that the model exits early (using fewer layers) on clean data while exits late (using more layers) on noisy data,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing
