TL;DR
This paper introduces a data-driven method to integrate black-box audio effects into neural networks, enabling automatic audio processing tasks like emulation, noise removal, and mastering with results comparable to commercial solutions.
Contribution
It presents a novel differentiable framework for black-box audio effects within neural networks, allowing end-to-end training for various audio processing applications.
Findings
Effective emulation of tube amplifiers.
Automatic removal of breaths and pops from voice recordings.
Music mastering results comparable to commercial solutions.
Abstract
We present a data-driven approach to automate audio signal processing by incorporating stateful third-party, audio effects as layers within a deep neural network. We then train a deep encoder to analyze input audio and control effect parameters to perform the desired signal manipulation, requiring only input-target paired audio data as supervision. To train our network with non-differentiable black-box effects layers, we use a fast, parallel stochastic gradient approximation scheme within a standard auto differentiation graph, yielding efficient end-to-end backpropagation. We demonstrate the power of our approach with three separate automatic audio production applications: tube amplifier emulation, automatic removal of breaths and pops from voice recordings, and automatic music mastering. We validate our results with a subjective listening test, showing our approach not only can enable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
