The Zero Resource Speech Challenge 2021: Spoken language modelling

Ewan Dunbar; Mathieu Bernard; Nicolas Hamilakis; Tu Anh Nguyen,; Maureen de Seyssel; Patricia Roz\'e; Morgane Rivi\`ere; Eugene Kharitonov,; Emmanuel Dupoux

arXiv:2104.14700·cs.CL·August 11, 2021

The Zero Resource Speech Challenge 2021: Spoken language modelling

Ewan Dunbar, Mathieu Bernard, Nicolas Hamilakis, Tu Anh Nguyen,, Maureen de Seyssel, Patricia Roz\'e, Morgane Rivi\`ere, Eugene Kharitonov,, Emmanuel Dupoux

PDF

TL;DR

The paper introduces the 2021 Zero Resource Speech Challenge, encouraging models to learn spoken language representations directly from audio without text labels, using a large speech dataset and multiple evaluation metrics.

Contribution

It presents a new challenge framework and baseline system for unsupervised spoken language modeling from raw audio, fostering progress in zero-resource speech processing.

Findings

01

Multiple submitted systems show promising results across evaluation metrics

02

Baseline system demonstrates the feasibility of unsupervised speech representation learning

03

Results highlight the challenges and potential directions for future research

Abstract

We present the Zero Resource Speech Challenge 2021, which asks participants to learn a language model directly from audio, without any text or labels. The challenge is based on the Libri-light dataset, which provides up to 60k hours of audio from English audio books without any associated text. We provide a pipeline baseline system consisting on an encoder based on contrastive predictive coding (CPC), a quantizer ( $k$ -means) and a standard language model (BERT or LSTM). The metrics evaluate the learned representations at the acoustic (ABX discrimination), lexical (spot-the-word), syntactic (acceptability judgment) and semantic levels (similarity judgment). We present an overview of the eight submitted systems from four groups and discuss the main results.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsInfoNCE · Contrastive Predictive Coding