Evaluating context-invariance in unsupervised speech representations

Mark Hallap; Emmanuel Dupoux; Ewan Dunbar

arXiv:2210.15775·cs.CL·June 1, 2023·1 cites

Evaluating context-invariance in unsupervised speech representations

Mark Hallap, Emmanuel Dupoux, Ewan Dunbar

PDF

Open Access 1 Repo

TL;DR

This paper introduces a new benchmark to measure context-invariance in unsupervised speech representations, revealing its importance for stable word-level encoding and guiding future research directions.

Contribution

It develops a novel version of the ZeroSpeech ABX benchmark specifically for assessing context-invariance in speech representations.

Findings

01

Context-invariance correlates with word-level stability.

02

Current models vary significantly in context-invariance.

03

Improving context-invariance could enhance language understanding.

Abstract

Unsupervised speech representations have taken off, with benchmarks (SUPERB, ZeroSpeech) demonstrating major progress on semi-supervised speech recognition, speech synthesis, and speech-only language modelling. Inspiration comes from the promise of ``discovering the phonemes'' of a language or a similar low-bitrate encoding. However, one of the critical properties of phoneme transcriptions is context-invariance: the phonetic context of a speech sound can have massive influence on the way it is pronounced, while the text remains stable. This is what allows tokens of the same word to have the same transcriptions -- key to language understanding. Current benchmarks do not measure context-invariance. We develop a new version of the ZeroSpeech ABX benchmark that measures context-invariance, and apply it to recent self-supervised representations. We demonstrate that the context-independence…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

perceptimatic/context-invariance-paper
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Topic Modeling · Natural Language Processing Techniques