Information Retrieval for ZeroSpeech 2021: The Submission by University   of Wroclaw

Jan Chorowski; Grzegorz Ciesielski; Jaros{\l}aw Dzikowski; Adrian; {\L}a\'ncucki; Ricard Marxer; Mateusz Opala; Piotr Pusz; Pawe{\l}; Rychlikowski; Micha{\l} Stypu{\l}kowski

arXiv:2106.11603·cs.LG·September 13, 2024

Information Retrieval for ZeroSpeech 2021: The Submission by University of Wroclaw

Jan Chorowski, Grzegorz Ciesielski, Jaros{\l}aw Dzikowski, Adrian, {\L}a\'ncucki, Ricard Marxer, Mateusz Opala, Piotr Pusz, Pawe{\l}, Rychlikowski, Micha{\l} Stypu{\l}kowski

PDF

1 Repo

TL;DR

This paper explores low-resource speech processing methods for the Zero Resource Speech Challenge 2021, showing that simple refinement techniques improve unsupervised speech representations, making them more suitable for pattern matching and retrieval tasks.

Contribution

It introduces effective low-resource refinement techniques for CPC-based speech representations, enhancing their utility for pattern matching in zero-resource scenarios.

Findings

01

Refined CPC representations outperform baseline in pattern matching.

02

Simple methods can rival high-resource approaches.

03

CPC representations are suitable for retrieval but not yet for language modeling.

Abstract

We present a number of low-resource approaches to the tasks of the Zero Resource Speech Challenge 2021. We build on the unsupervised representations of speech proposed by the organizers as a baseline, derived from CPC and clustered with the k-means algorithm. We demonstrate that simple methods of refining those representations can narrow the gap, or even improve upon the solutions which use a high computational budget. The results lead to the conclusion that the CPC-derived representations are still too noisy for training language models, but stable enough for simpler forms of pattern matching and retrieval.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

chorowski-lab/zs2021
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.