HyperGANStrument: Instrument Sound Synthesis and Editing with Pitch-Invariant Hypernetworks
Zhe Zhang, Taketo Akama

TL;DR
HyperGANStrument introduces a pitch-invariant hypernetwork to modulate a pre-trained GAN-based instrument sound synthesizer, significantly improving sound reconstruction fidelity, editability, and diversity through adversarial fine-tuning.
Contribution
It presents a novel pitch-invariant hypernetwork approach that enhances a GAN-based instrument sound synthesizer's reconstruction and editing capabilities.
Findings
Improved sound reconstruction fidelity and diversity.
Enhanced editability of synthesized instrument sounds.
Significant performance gains demonstrated in experiments.
Abstract
GANStrument, exploiting GANs with a pitch-invariant feature extractor and instance conditioning technique, has shown remarkable capabilities in synthesizing realistic instrument sounds. To further improve the reconstruction ability and pitch accuracy to enhance the editability of user-provided sound, we propose HyperGANStrument, which introduces a pitch-invariant hypernetwork to modulate the weights of a pre-trained GANStrument generator, given a one-shot sound as input. The hypernetwork modulation provides feedback for the generator in the reconstruction of the input sound. In addition, we take advantage of an adversarial fine-tuning scheme for the hypernetwork to improve the reconstruction fidelity and generation diversity of the generator. Experimental results show that the proposed model not only enhances the generation capability of GANStrument but also significantly improves the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Speech and Audio Processing
MethodsHyperNetwork
