GANStrument: Adversarial Instrument Sound Synthesis with Pitch-invariant Instance Conditioning
Gaku Narita, Junichi Shimizu, Taketo Akama

TL;DR
GANStrument is a novel adversarial model that synthesizes realistic instrument sounds from a single input, maintaining timbre and pitch invariance, with improved fidelity, diversity, and input flexibility.
Contribution
It introduces a pitch-invariant feature extractor and instance conditioning to enhance sound synthesis quality and generalization in GAN-based models.
Findings
Outperforms baseline models in sound quality and diversity
Achieves better pitch accuracy and timbre consistency
Demonstrates improved input editability and generalization
Abstract
We propose GANStrument, a generative adversarial model for instrument sound synthesis. Given a one-shot sound as input, it is able to generate pitched instrument sounds that reflect the timbre of the input within an interactive time. By exploiting instance conditioning, GANStrument achieves better fidelity and diversity of synthesized sounds and generalization ability to various inputs. In addition, we introduce an adversarial training scheme for a pitch-invariant feature extractor that significantly improves the pitch accuracy and timbre consistency. Experimental results show that GANStrument outperforms strong baselines that do not use instance conditioning in terms of generation quality and input editability. Qualitative examples are available online.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic Technology and Sound Studies · Music and Audio Processing · Model Reduction and Neural Networks
