MusicJam: Visualizing Music Insights via Generated Narrative Illustrations
Chuer Chen, Nan Cao, Jiani Hou, Yi Guo, Yulei Zhang, Yang Shi

TL;DR
MusicJam is a system that creates immersive music videos by generating lyrics from music using GPT-2 and transforming these lyrics into illustrations with stable diffusion, enhancing visualization of music insights.
Contribution
The paper introduces a novel system combining GPT-2 and stable diffusion to generate synchronized narrative illustrations for music visualization.
Findings
The lyric generation model outperforms baseline models.
User study confirms high quality of generated illustrations.
Generated music videos effectively visualize music insights.
Abstract
Visualizing the insights of the invisible music is able to bring listeners an enjoyable and immersive listening experience, and therefore has attracted much attention in the field of information visualization. Over the past decades, various music visualization techniques have been introduced. However, most of them are manually designed by following the visual encoding rules, thus shown in form of a graphical visual representation whose visual encoding schema is usually taking effort to understand. Recently, some researchers use figures or illustrations to represent music moods, lyrics, and musical features, which are more intuitive and attractive. However, in these techniques, the figures are usually pre-selected or statically generated, so they cannot precisely convey insights of different pieces of music. To address this issue, in this paper, we introduce MusicJam, a music…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Video Analysis and Summarization
