A Theoretical Framework for Acoustic Neighbor Embeddings

Woojay Jeon

arXiv:2412.02164·eess.AS·December 4, 2024

A Theoretical Framework for Acoustic Neighbor Embeddings

Woojay Jeon

PDF

Open Access

TL;DR

This paper introduces a probabilistic theoretical framework for acoustic neighbor embeddings, enabling principled interpretation and application in phonetic similarity, with empirical validation across diverse tasks including word classification and dialect clustering.

Contribution

It provides a novel probabilistic interpretation of acoustic neighbor embeddings and demonstrates their effectiveness in phonetic tasks, supported by theoretical and empirical evidence.

Findings

01

Nearest-neighbor search matches FST accuracy for large vocabularies

02

Embedding distances closely approximate phone edit distances in OOV word recovery

03

Clustering hierarchies align with human listening experiments

Abstract

This paper provides a theoretical framework for interpreting acoustic neighbor embeddings, which are representations of the phonetic content of variable-width audio or text in a fixed-dimensional embedding space. A probabilistic interpretation of the distances between embeddings is proposed, based on a general quantitative definition of phonetic similarity between words. This provides us a framework for understanding and applying the embeddings in a principled manner. Theoretical and empirical evidence to support an approximation of uniform cluster-wise isotropy are shown, which allows us to reduce the distances to simple Euclidean distances. Four experiments that validate the framework and demonstrate how it can be applied to diverse problems are described. Nearest-neighbor search between audio and text embeddings can give isolated word classification accuracy that is identical to that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing