AMSS-Net: Audio Manipulation on User-Specified Sources with Textual   Queries

Woosung Choi; Minseok Kim; Marco A. Mart\'inez Ram\'irez; Jaehwa; Chung; Soonyoung Jung

arXiv:2104.13553·eess.AS·April 29, 2021

AMSS-Net: Audio Manipulation on User-Specified Sources with Textual Queries

Woosung Choi, Minseok Kim, Marco A. Mart\'inez Ram\'irez, Jaehwa, Chung, Soonyoung Jung

PDF

1 Repo

TL;DR

This paper introduces AMSS-Net, a neural network designed for precise audio source manipulation based on textual descriptions, effectively isolating and altering specific sources while maintaining others.

Contribution

The paper presents AMSS-Net, a novel neural network architecture that extracts and manipulates latent audio sources according to user queries, addressing the challenge of source transparency in audio.

Findings

01

AMSS-Net outperforms baseline methods on multiple AMSS tasks.

02

The proposed evaluation benchmark effectively measures source-specific audio manipulation.

03

Empirical results demonstrate high accuracy in source isolation and manipulation.

Abstract

This paper proposes a neural network that performs audio transformations to user-specified sources (e.g., vocals) of a given audio track according to a given description while preserving other sources not mentioned in the description. Audio Manipulation on a Specific Source (AMSS) is challenging because a sound object (i.e., a waveform sample or frequency bin) is `transparent'; it usually carries information from multiple sources, in contrast to a pixel in an image. To address this challenging problem, we propose AMSS-Net, which extracts latent sources and selectively manipulates them while preserving irrelevant sources. We also propose an evaluation benchmark for several AMSS tasks, and we show that AMSS-Net outperforms baselines on several AMSS tasks via objective metrics and empirical verification.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kuielab/AMSS-Net
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.