TL;DR
This paper introduces a theoretical framework based on conditional independence to select effective pseudo-labels for self-supervised speech learning, reducing computational costs and improving downstream task performance prediction.
Contribution
It proposes a novel, training-free estimator for pseudo-label utility grounded in conditional independence theory, specifically tailored for speech representation learning.
Findings
Strong correlation between utility estimates and downstream performance
Effective pseudo-label selection reduces computational costs
Validated on speaker recognition and speech recognition tasks
Abstract
Through solving pretext tasks, self-supervised learning (SSL) leverages unlabeled data to extract useful latent representations replacing traditional input features in the downstream task. A common pretext task consists in pretraining a SSL model on pseudo-labels derived from the original signal. This technique is particularly relevant for speech data where various meaningful signal processing features may serve as pseudo-labels. However, the process of selecting pseudo-labels, for speech or other types of data, remains mostly unexplored and currently relies on observing the results on the final downstream task. Nevertheless, this methodology is not sustainable at scale due to substantial computational (hence carbon) costs. Thus, this paper introduces a practical and theoretical framework to select relevant pseudo-labels with respect to a given downstream task. More precisely, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
