A comparison of self-supervised speech representations as input features for unsupervised acoustic word embeddings
Lisa van Staden, Herman Kamper

TL;DR
This paper evaluates various self-supervised frame-level speech representations as inputs for unsupervised acoustic word embeddings, demonstrating that CPC features outperform traditional MFCCs and transfer effectively across languages.
Contribution
It compares the effectiveness of contrastive predictive coding, autoregressive predictive coding, and autoencoder features against MFCCs for unsupervised AWE, highlighting CPC's superior performance and transferability.
Findings
CPC features outperform MFCCs in word discrimination tasks.
All self-supervised features improve over traditional MFCCs.
CPC features trained on English transfer well to Xitsonga.
Abstract
Many speech processing tasks involve measuring the acoustic similarity between speech segments. Acoustic word embeddings (AWE) allow for efficient comparisons by mapping speech segments of arbitrary duration to fixed-dimensional vectors. For zero-resource speech processing, where unlabelled speech is the only available resource, some of the best AWE approaches rely on weak top-down constraints in the form of automatically discovered word-like segments. Rather than learning embeddings at the segment level, another line of zero-resource research has looked at representation learning at the short-time frame level. Recent approaches include self-supervised predictive coding and correspondence autoencoder (CAE) models. In this paper we consider whether these frame-level features are beneficial when used as inputs for training to an unsupervised AWE model. We compare frame-level features from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSolana Customer Service Number +1-833-534-1729 · InfoNCE · Contrastive Predictive Coding
