GANStrument: Adversarial Instrument Sound Synthesis with Pitch-invariant   Instance Conditioning

Gaku Narita; Junichi Shimizu; Taketo Akama

arXiv:2211.05385·cs.SD·March 8, 2023

GANStrument: Adversarial Instrument Sound Synthesis with Pitch-invariant Instance Conditioning

Gaku Narita, Junichi Shimizu, Taketo Akama

PDF

Open Access

TL;DR

GANStrument is a novel adversarial model that synthesizes realistic instrument sounds from a single input, maintaining timbre and pitch invariance, with improved fidelity, diversity, and input flexibility.

Contribution

It introduces a pitch-invariant feature extractor and instance conditioning to enhance sound synthesis quality and generalization in GAN-based models.

Findings

01

Outperforms baseline models in sound quality and diversity

02

Achieves better pitch accuracy and timbre consistency

03

Demonstrates improved input editability and generalization

Abstract

We propose GANStrument, a generative adversarial model for instrument sound synthesis. Given a one-shot sound as input, it is able to generate pitched instrument sounds that reflect the timbre of the input within an interactive time. By exploiting instance conditioning, GANStrument achieves better fidelity and diversity of synthesized sounds and generalization ability to various inputs. In addition, we introduce an adversarial training scheme for a pitch-invariant feature extractor that significantly improves the pitch accuracy and timbre consistency. Experimental results show that GANStrument outperforms strong baselines that do not use instance conditioning in terms of generation quality and input editability. Qualitative examples are available online.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic Technology and Sound Studies · Music and Audio Processing · Model Reduction and Neural Networks