Synthesizer Sound Matching Using Audio Spectrogram Transformers

Fred Bruford; Frederik Blang; and Shahan Nercessian

arXiv:2407.16643·eess.AS·July 24, 2024

Synthesizer Sound Matching Using Audio Spectrogram Transformers

Fred Bruford, Frederik Blang, and Shahan Nercessian

PDF

Open Access

TL;DR

This paper presents a novel Audio Spectrogram Transformer-based model for synthesizer sound matching, capable of accurately emulating input sounds and outperforming baseline neural networks, with applications in diverse musical contexts.

Contribution

Introduces a transformer-based sound matching model for synthesizers that generalizes across different sounds and synthesizer types, improving fidelity over traditional neural network approaches.

Findings

01

Model outperforms MLP and CNN baselines in parameter reconstruction.

02

Capable of emulating vocal imitations and sounds from various synthesizers.

03

Demonstrates robustness in out-of-domain sound matching.

Abstract

Systems for synthesizer sound matching, which automatically set the parameters of a synthesizer to emulate an input sound, have the potential to make the process of synthesizer programming faster and easier for novice and experienced musicians alike, whilst also affording new means of interaction with synthesizers. Considering the enormous variety of synthesizers in the marketplace, and the complexity of many of them, general-purpose sound matching systems that function with minimal knowledge or prior assumptions about the underlying synthesis architecture are particularly desirable. With this in mind, we introduce a synthesizer sound matching model based on the Audio Spectrogram Transformer. We demonstrate the viability of this model by training on a large synthetic dataset of randomly generated samples from the popular Massive synthesizer. We show that this model can reconstruct…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis

MethodsSparse Evolutionary Training · Byte Pair Encoding · Layer Normalization · Label Smoothing · Linear Layer · Softmax · Attention Is All You Need · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Multi-Head Attention