Disentangling speech from surroundings with neural embeddings

Ahmed Omran; Neil Zeghidour; Zal\'an Borsos; F\'elix de Chaumont; Quitry; Malcolm Slaney; Marco Tagliasacchi

arXiv:2203.15578·cs.SD·June 6, 2023·1 cites

Disentangling speech from surroundings with neural embeddings

Ahmed Omran, Neil Zeghidour, Zal\'an Borsos, F\'elix de Chaumont, Quitry, Malcolm Slaney, Marco Tagliasacchi

PDF

Open Access

TL;DR

This paper introduces a neural embedding-based method to effectively disentangle speech from environmental noise and reverberation, enabling cleaner audio separation and targeted audio adjustments.

Contribution

A novel training procedure for neural audio codecs that produces structured embeddings, separating speech from environmental factors in the embedding space.

Findings

01

Successful separation of speech from noise and reverberation

02

Ability to modify audio characteristics through embedding manipulation

03

Structured embeddings enable targeted audio editing

Abstract

We present a method to separate speech signals from noisy environments in the embedding space of a neural audio codec. We introduce a new training procedure that allows our model to produce structured encodings of audio waveforms given by embedding vectors, where one part of the embedding vector represents the speech signal, and the rest represent the environment. We achieve this by partitioning the embeddings of different input waveforms and training the model to faithfully reconstruct audio from mixed partitions, thereby ensuring each partition encodes a separate audio attribute. As use cases, we demonstrate the separation of speech from background noise or from reverberation characteristics. Our method also allows for targeted adjustments of the audio output characteristics.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Hearing Loss and Rehabilitation