Synthesizer Preset Interpolation using Transformer Auto-Encoders
Gwendal Le Vaillant, Thierry Dutoit

TL;DR
This paper presents a transformer-based auto-encoder model for smooth interpolation between sound synthesizer presets, enabling intuitive sound creation and integration into commercial synthesizers.
Contribution
It introduces a bimodal auto-encoder architecture that processes both presets and audio, improving interpolation quality over existing methods.
Findings
The model achieves smoother interpolations than related architectures.
It successfully processes over one hundred FM synthesizer parameters.
The approach can be integrated into commercial synthesizers for real-time use.
Abstract
Sound synthesizers are widespread in modern music production but they increasingly require expert skills to be mastered. This work focuses on interpolation between presets, i.e., sets of values of all sound synthesis parameters, to enable the intuitive creation of new sounds from existing ones. We introduce a bimodal auto-encoder neural network, which simultaneously processes presets using multi-head attention blocks, and audio using convolutions. This model has been tested on a popular frequency modulation synthesizer with more than one hundred parameters. Experiments have compared the model to related architectures and methods, and have demonstrated that it performs smoother interpolations. After training, the proposed model can be integrated into commercial synthesizers for live interpolation or sound design tasks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Music Technology and Sound Studies
MethodsLinear Layer · Softmax
