Asymmetric Proxy Loss for Multi-View Acoustic Word Embeddings

Myunghun Jung; Hoirin Kim

arXiv:2203.16080·eess.AS·June 28, 2022

Asymmetric Proxy Loss for Multi-View Acoustic Word Embeddings

Myunghun Jung, Hoirin Kim

PDF

Open Access

TL;DR

This paper introduces an asymmetric-proxy loss for multi-view acoustic word embeddings, improving word discrimination by leveraging a proxy-based deep metric learning framework that considers asymmetric relationships.

Contribution

It proposes a novel asymmetric-proxy loss within a proxy-based framework for multi-view acoustic word embeddings, enhancing discriminative power in speech representation learning.

Findings

01

The proposed asymmetric-proxy loss outperforms existing proxy-based losses.

02

The method improves word discrimination accuracy on WSJ corpus.

03

Experimental results validate the effectiveness of the new loss function.

Abstract

Acoustic word embeddings (AWEs) are discriminative representations of speech segments, and learned embedding space reflects the phonetic similarity between words. With multi-view learning, where text labels are considered as supplementary input, AWEs are jointly trained with acoustically grounded word embeddings (AGWEs). In this paper, we expand the multi-view approach into a proxy-based framework for deep metric learning by equating AGWEs with proxies. A simple modification in computing the similarity matrix allows the general pair weighting to formulate the data-to-proxy relationship. Under the systematized framework, we propose an asymmetric-proxy loss that combines different parts of loss functions asymmetrically while keeping their merits. It follows the assumptions that the optimal function for anchor-positive pairs may differ from one for anchor-negative pairs, and a proxy may…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing