Time-Varying Audio Effect Modeling by End-to-End Adversarial Training

Yann Bourdin; Pierrick Legrand; Fanny Roche

arXiv:2512.15313·cs.SD·December 18, 2025

Time-Varying Audio Effect Modeling by End-to-End Adversarial Training

Yann Bourdin, Pierrick Legrand, Fanny Roche

PDF

Open Access

TL;DR

This paper presents a novel GAN-based approach for modeling time-varying audio effects solely from input-output recordings, eliminating the need for control signal extraction and enabling black-box modeling of dynamic audio effects.

Contribution

It introduces a two-stage training strategy with adversarial learning and state prediction for effective time-varying effect modeling without control signals.

Findings

01

Successfully models a vintage hardware phaser's time-varying behavior

02

Develops a new chirp-train based metric for modulation accuracy

03

Demonstrates improved black-box modeling of dynamic audio effects

Abstract

Deep learning has become a standard approach for the modeling of audio effects, yet strictly black-box modeling remains problematic for time-varying systems. Unlike time-invariant effects, training models on devices with internal modulation typically requires the recording or extraction of control signals to ensure the time-alignment required by standard loss functions. This paper introduces a Generative Adversarial Network (GAN) framework to model such effects using only input-output audio recordings, removing the need for modulation signal extraction. We propose a convolutional-recurrent architecture trained via a two-stage strategy: an initial adversarial phase allows the model to learn the distribution of the modulation behavior without strict phase constraints, followed by a supervised fine-tuning phase where a State Prediction Network (SPN) estimates the initial internal states…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Speech and Audio Processing · Music and Audio Processing