TimbreCLIP: Connecting Timbre to Text and Images

Nicolas Jonason; Bob L.T. Sturm

arXiv:2211.11225·cs.SD·November 22, 2022·1 cites

TimbreCLIP: Connecting Timbre to Text and Images

Nicolas Jonason, Bob L.T. Sturm

PDF

Open Access

TL;DR

TimbreCLIP introduces a cross-modal embedding connecting musical instrument timbre to text and images, enabling applications like text-driven audio equalization and timbre to image synthesis.

Contribution

It presents a novel audio-text embedding trained on instrument notes, demonstrating its utility in cross-modal retrieval and creative audio-visual tasks.

Findings

01

Effective cross-modal retrieval on synth patches

02

Successful application in text-driven audio equalization

03

Timbre to image generation demonstrated

Abstract

We present work in progress on TimbreCLIP, an audio-text cross modal embedding trained on single instrument notes. We evaluate the models with a cross-modal retrieval task on synth patches. Finally, we demonstrate the application of TimbreCLIP on two tasks: text-driven audio equalization and timbre to image generation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Music Technology and Sound Studies · Speech and Audio Processing