INSTRUMENTAL: Automatic Synthesizer Parameter Recovery from Audio via Evolutionary Optimization
Philipp Bogdan

TL;DR
Instrumental is a system that recovers detailed synthesizer parameters from audio recordings using evolutionary optimization, enabling more accurate instrument modeling beyond note extraction.
Contribution
It introduces a differentiable synthesizer coupled with CMA-ES for parameter recovery, demonstrating the effectiveness of evolutionary optimization over gradient methods.
Findings
CMA-ES outperforms gradient descent in this non-convex optimization.
Parametric EQ boosting improves convergence.
Spectral analysis initialization accelerates convergence.
Abstract
Existing audio-to-MIDI tools extract notes but discard the timbral characteristics that define an instrument's identity. We present Instrumental, a system that recovers continuous synthesizer parameters from audio by coupling a differentiable 28-parameter subtractive synthesizer with CMA-ES, a derivative-free evolutionary optimizer. We optimize a composite perceptual loss combining mel-scaled STFT, spectral centroid, and MFCC divergence, achieving a matching loss of 2.09 on real recorded audio. We systematically evaluate eight hypotheses for improving convergence and find that only parametric EQ boosting yields meaningful improvement. Our results show that CMA-ES outperforms gradient descent on this non-convex landscape, that more parameters do not monotonically improve matching, and that spectral analysis initialization accelerates convergence over random starts.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic Technology and Sound Studies · Music and Audio Processing · Speech and Audio Processing
