ICGAN: An implicit conditioning method for interpretable feature control   of neural audio synthesis

Yunyi Liu; Craig Jin

arXiv:2406.07131·cs.SD·June 12, 2024

ICGAN: An implicit conditioning method for interpretable feature control of neural audio synthesis

Yunyi Liu, Craig Jin

PDF

Open Access 1 Repo

TL;DR

This paper introduces ICGAN, a novel implicit conditioning approach for neural audio synthesis that enables interpretable and continuous control over sound features without relying on explicit labels, improving sound manipulation capabilities.

Contribution

It presents a new implicit conditioning method using GANs that creates a continuous feature space for controllable sound synthesis without explicit labels.

Findings

01

Effective timbre manipulation demonstrated

02

Controllable sound variation achieved in-domain and cross-domain

03

Introduces an evaluation metric for controllability

Abstract

Neural audio synthesis methods can achieve high-fidelity and realistic sound generation by utilizing deep generative models. Such models typically rely on external labels which are often discrete as conditioning information to achieve guided sound generation. However, it remains difficult to control the subtle changes in sounds without appropriate and descriptive labels, especially given a limited dataset. This paper proposes an implicit conditioning method for neural audio synthesis using generative adversarial networks that allows for interpretable control of the acoustic features of synthesized sounds. Our technique creates a continuous conditioning space that enables timbre manipulation without relying on explicit labels. We further introduce an evaluation metric to explore controllability and demonstrate that our approach is effective in enabling a degree of controlled variation of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Reinliu/ICGAN
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Music Technology and Sound Studies · Speech and Audio Processing