Establishing degrees of closeness between audio recordings along   different dimensions using large-scale cross-lingual models

Maxime Fily; Guillaume Wisniewski; Severine Guillaume; Gilles Adda,; Alexis Michaud

arXiv:2402.05581·cs.CL·February 9, 2024·1 cites

Establishing degrees of closeness between audio recordings along different dimensions using large-scale cross-lingual models

Maxime Fily, Guillaume Wisniewski, Severine Guillaume, Gilles Adda,, Alexis Michaud

PDF

Open Access

TL;DR

This paper introduces an unsupervised ABX testing method using large-scale cross-lingual models to analyze and compare speech representations across different audio dimensions, especially in low-resource language contexts.

Contribution

It presents a novel unsupervised approach to evaluate speech representations along multiple audio dimensions using ABX tests, applicable to under-documented languages.

Findings

01

Representations differ along linguistic and extra-linguistic lines.

02

More audio signal improves discrimination of extra-linguistic features.

03

Shorter snippets are better for segmental phonetic distinctions.

Abstract

In the highly constrained context of low-resource language studies, we explore vector representations of speech from a pretrained model to determine their level of abstraction with regard to the audio signal. We propose a new unsupervised method using ABX tests on audio recordings with carefully curated metadata to shed light on the type of information present in the representations. ABX tests determine whether the representations computed by a multilingual speech model encode a given characteristic. Three experiments are devised: one on room acoustics aspects, one on linguistic genre, and one on phonetic aspects. The results confirm that the representations extracted from recordings with different linguistic/extra-linguistic characteristics differ along the same lines. Embedding more audio signal in one vector better discriminates extra-linguistic characteristics, whereas shorter…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Language and cultural evolution · Phonetics and Phonology Research