Contrastive prediction strategies for unsupervised segmentation and categorization of phonemes and words
Santiago Cuervo, Maciej Grabias, Jan Chorowski, Grzegorz Ciesielski,, Adrian {\L}a\'ncucki, Pawe{\l} Rychlikowski, Ricard Marxer

TL;DR
This paper explores contrastive predictive coding methods for unsupervised phoneme and word segmentation and categorization, identifying a trade-off caused by context networks and proposing a multi-level model to improve both tasks.
Contribution
The paper introduces multi-level ACPC (mACPC), a novel variation of CPC that enhances phoneme categorization and achieves state-of-the-art word segmentation performance.
Findings
mACPC outperforms previous models in categorization metrics
mACPC achieves state-of-the-art results in word segmentation
Using multi-level modeling reduces the trade-off between segmentation and categorization
Abstract
We investigate the performance on phoneme categorization and phoneme and word segmentation of several self-supervised learning (SSL) methods based on Contrastive Predictive Coding (CPC). Our experiments show that with the existing algorithms there is a trade off between categorization and segmentation performance. We investigate the source of this conflict and conclude that the use of context building networks, albeit necessary for superior performance on categorization tasks, harms segmentation performance by causing a temporal shift on the learned representations. Aiming to bridge this gap, we take inspiration from the leading approach on segmentation, which simultaneously models the speech signal at the frame and phoneme level, and incorporate multi-level modelling into Aligned CPC (ACPC), a variation of CPC which exhibits the best performance on categorization tasks. Our multi-level…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Natural Language Processing Techniques
MethodsInfoNCE · Contrastive Predictive Coding
