Neural Style Transfer for Audio Spectograms
Prateek Verma, Julius O. Smith

TL;DR
This paper introduces a neural style transfer method for audio spectrograms, enabling artistic sound transformations like bandwidth modification and timbral transfer using a unified neural architecture.
Contribution
It adapts image style transfer techniques to audio, allowing diverse sound transformations with a single neural model, simplifying previous complex signal processing pipelines.
Findings
Successfully performed bandwidth expansion and compression.
Achieved timbral transfer from singing voice to instruments.
Unified approach reduces need for multiple specialized pipelines.
Abstract
There has been fascinating work on creating artistic transformations of images by Gatys. This was revolutionary in how we can in some sense alter the 'style' of an image while generally preserving its 'content'. In our work, we present a method for creating new sounds using a similar approach, treating it as a style-transfer problem, starting from a random-noise input signal and iteratively using back-propagation to optimize the sound to conform to filter-outputs from a pre-trained neural architecture of interest. For demonstration, we investigate two different tasks, resulting in bandwidth expansion/compression, and timbral transfer from singing voice to musical instruments. A feature of our method is that a single architecture can generate these different audio-style-transfer types using the same set of parameters which otherwise require different complex hand-tuned diverse signal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies
