LISA: Localized Image Stylization with Audio via Implicit Neural Representation
Seung Hyun Lee, Chanyoung Kim, Wonmin Byeon, Sang Ho Yoon, Jinkyu Kim,, Sangpil Kim

TL;DR
LISA introduces a novel audio-driven localized image stylization framework that uses implicit neural representations and CLIP-based localization to selectively stylize image regions based on sound, outperforming existing methods.
Contribution
The paper proposes a new framework combining audio-visual localization with implicit neural representations for targeted image stylization based on sound.
Findings
Outperforms existing audio-guided stylization methods.
Constructs concise localization maps linked to audio input.
Effectively manipulates specific image regions in accordance with sound.
Abstract
We present a novel framework, Localized Image Stylization with Audio (LISA) which performs audio-driven localized image stylization. Sound often provides information about the specific context of the scene and is closely related to a certain part of the scene or object. However, existing image stylization works have focused on stylizing the entire image using an image or text input. Stylizing a particular part of the image based on audio input is natural but challenging. In this work, we propose a framework that a user provides an audio input to localize the sound source in the input image and another for locally stylizing the target object or scene. LISA first produces a delicate localization map with an audio-visual localization network by leveraging CLIP embedding space. We then utilize implicit neural representation (INR) along with the predicted localization map to stylize the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Video Analysis and Summarization · Handwritten Text Recognition Techniques
MethodsContrastive Language-Image Pre-training
