Time-Varying Audio Effect Modeling by End-to-End Adversarial Training
Yann Bourdin, Pierrick Legrand, Fanny Roche

TL;DR
This paper presents a novel GAN-based approach for modeling time-varying audio effects solely from input-output recordings, eliminating the need for control signal extraction and enabling black-box modeling of dynamic audio effects.
Contribution
It introduces a two-stage training strategy with adversarial learning and state prediction for effective time-varying effect modeling without control signals.
Findings
Successfully models a vintage hardware phaser's time-varying behavior
Develops a new chirp-train based metric for modulation accuracy
Demonstrates improved black-box modeling of dynamic audio effects
Abstract
Deep learning has become a standard approach for the modeling of audio effects, yet strictly black-box modeling remains problematic for time-varying systems. Unlike time-invariant effects, training models on devices with internal modulation typically requires the recording or extraction of control signals to ensure the time-alignment required by standard loss functions. This paper introduces a Generative Adversarial Network (GAN) framework to model such effects using only input-output audio recordings, removing the need for modulation signal extraction. We propose a convolutional-recurrent architecture trained via a two-stage strategy: an initial adversarial phase allows the model to learn the distribution of the modulation behavior without strict phase constraints, followed by a supervised fine-tuning phase where a State Prediction Network (SPN) estimates the initial internal states…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Speech and Audio Processing · Music and Audio Processing
