Timbre Transfer with Variational Auto Encoding and Cycle-Consistent Adversarial Networks
Russell Sammut Bonnici, Charalampos Saitis, Martin Benning

TL;DR
This paper explores deep learning methods combining Variational Autoencoders and GANs for timbre transfer across speakers and instruments, demonstrating improved reconstruction and transfer quality on multiple datasets.
Contribution
It introduces a novel approach integrating VAEs and GANs for timbre transfer, comparing different architectures and loss functions for enhanced performance.
Findings
Many-to-many transfer outperforms one-to-one in reconstruction.
Basic residual blocks are more effective than bottleneck designs.
Cyclic loss choice has minimal impact on transfer quality.
Abstract
This research project investigates the application of deep learning to timbre transfer, where the timbre of a source audio can be converted to the timbre of a target audio with minimal loss in quality. The adopted approach combines Variational Autoencoders with Generative Adversarial Networks to construct meaningful representations of the source audio and produce realistic generations of the target audio and is applied to the Flickr 8k Audio dataset for transferring the vocal timbre between speakers and the URMP dataset for transferring the musical timbre between instruments. Furthermore, variations of the adopted approach are trained, and generalised performance is compared using the metrics SSIM (Structural Similarity Index) and FAD (Frech\'et Audio Distance). It was found that a many-to-many approach supersedes a one-to-one approach in terms of reconstructive capabilities, and that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Diverse Musicological Studies
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Residual Connection · 1x1 Convolution · Batch Normalization · Convolution · Bottleneck Residual Block · Residual Block
