Integrating Form and Meaning: A Multi-Task Learning Model for Acoustic   Word Embeddings

Badr M. Abdullah; Bernd M\"obius; Dietrich Klakow

arXiv:2209.06633·cs.CL·September 20, 2022

Integrating Form and Meaning: A Multi-Task Learning Model for Acoustic Word Embeddings

Badr M. Abdullah, Bernd M\"obius, Dietrich Klakow

PDF

Open Access 1 Repo

TL;DR

This paper introduces a multi-task learning model for acoustic word embeddings that integrates high-level lexical knowledge, improving the discriminability and lexical category separation of the embeddings across three languages.

Contribution

The paper presents a novel multi-task learning approach that combines acoustic cues with lexical knowledge during training of AWEs, which was not previously explored.

Findings

01

Enhanced discriminability of embeddings

02

Better separation of lexical categories

03

Improved performance across three languages

Abstract

Models of acoustic word embeddings (AWEs) learn to map variable-length spoken word segments onto fixed-dimensionality vector representations such that different acoustic exemplars of the same word are projected nearby in the embedding space. In addition to their speech technology applications, AWE models have been shown to predict human performance on a variety of auditory lexical processing tasks. Current AWE models are based on neural networks and trained in a bottom-up approach that integrates acoustic cues to build up a word representation given an acoustic or symbolic supervision signal. Therefore, these models do not leverage or capture high-level lexical knowledge during the learning process. In this paper, we propose a multi-task learning model that incorporates top-down lexical knowledge into the training procedure of AWEs. Our model learns a mapping between the acoustic input…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

uds-lsv/semantically_enriched_awes
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech Recognition and Synthesis · Speech and Audio Processing