Network Bending of Diffusion Models for Audio-Visual Generation
Luke Dzwonczyk, Carmine Emanuele Cella, David Ban

TL;DR
This paper explores network bending in diffusion models to enable artists to create music visualizations with fine-grain control and novel visual effects, including music-reactive videos.
Contribution
It introduces the application of network bending to diffusion models for creative visual effects and music-reactive video generation, expanding the toolset for artistic image manipulation.
Findings
Network bending produces unique visual effects not easily replicated with standard tools.
It enables continuous, fine-grain control over image generation.
Music-reactive videos can be generated by passing audio features as parameters.
Abstract
In this paper we present the first steps towards the creation of a tool which enables artists to create music visualizations using pre-trained, generative, machine learning models. First, we investigate the application of network bending, the process of applying transforms within the layers of a generative network, to image generation diffusion models by utilizing a range of point-wise, tensor-wise, and morphological operators. We identify a number of visual effects that result from various operators, including some that are not easily recreated with standard image editing tools. We find that this process allows for continuous, fine-grain control of image generation which can be helpful for creative applications. Next, we generate music-reactive videos using Stable Diffusion by passing audio features as parameters to network bending operators. Finally, we comment on certain transforms…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic Technology and Sound Studies
MethodsDiffusion
