TokenSynth: A Token-based Neural Synthesizer for Instrument Cloning and   Text-to-Instrument

Kyungsu Kim; Junghyun Koo; Sungho Lee; Haesun Joung; Kyogu Lee

arXiv:2502.08939·cs.SD·February 14, 2025

TokenSynth: A Token-based Neural Synthesizer for Instrument Cloning and Text-to-Instrument

Kyungsu Kim, Junghyun Koo, Sungho Lee, Haesun Joung, Kyogu Lee

PDF

Open Access 1 Repo

TL;DR

TokenSynth is a neural audio synthesizer that uses token-based representations and transformer models to perform instrument cloning, text-to-instrument synthesis, and timbre manipulation without fine-tuning, enabling flexible sound design.

Contribution

It introduces a novel token-based neural synthesizer leveraging transformer architecture for versatile audio generation tasks without fine-tuning.

Findings

01

High-quality audio synthesis demonstrated

02

Effective timbral similarity achieved

03

Accurate MIDI following in synthesis

Abstract

Recent advancements in neural audio codecs have enabled the use of tokenized audio representations in various audio generation tasks, such as text-to-speech, text-to-audio, and text-to-music generation. Leveraging this approach, we propose TokenSynth, a novel neural synthesizer that utilizes a decoder-only transformer to generate desired audio tokens from MIDI tokens and CLAP (Contrastive Language-Audio Pretraining) embedding, which has timbre-related information. Our model is capable of performing instrument cloning, text-to-instrument synthesis, and text-guided timbre manipulation without any fine-tuning. This flexibility enables diverse sound design and intuitive timbre control. We evaluated the quality of the synthesized audio, the timbral similarity between synthesized and target audio/text, and synthesis accuracy (i.e., how accurately it follows the input MIDI) using objective…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kyungsukim42/tokensynth
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic Technology and Sound Studies · Music and Audio Processing · Image Processing and 3D Reconstruction