ICGAN: An implicit conditioning method for interpretable feature control of neural audio synthesis
Yunyi Liu, Craig Jin

TL;DR
This paper introduces ICGAN, a novel implicit conditioning approach for neural audio synthesis that enables interpretable and continuous control over sound features without relying on explicit labels, improving sound manipulation capabilities.
Contribution
It presents a new implicit conditioning method using GANs that creates a continuous feature space for controllable sound synthesis without explicit labels.
Findings
Effective timbre manipulation demonstrated
Controllable sound variation achieved in-domain and cross-domain
Introduces an evaluation metric for controllability
Abstract
Neural audio synthesis methods can achieve high-fidelity and realistic sound generation by utilizing deep generative models. Such models typically rely on external labels which are often discrete as conditioning information to achieve guided sound generation. However, it remains difficult to control the subtle changes in sounds without appropriate and descriptive labels, especially given a limited dataset. This paper proposes an implicit conditioning method for neural audio synthesis using generative adversarial networks that allows for interpretable control of the acoustic features of synthesized sounds. Our technique creates a continuous conditioning space that enables timbre manipulation without relying on explicit labels. We further introduce an evaluation metric to explore controllability and demonstrate that our approach is effective in enabling a degree of controlled variation of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Speech and Audio Processing
