Learning Visual Styles from Audio-Visual Associations

Tingle Li; Yichen Liu; Andrew Owens; Hang Zhao

arXiv:2205.05072·cs.CV·May 11, 2022

Learning Visual Styles from Audio-Visual Associations

Tingle Li, Yichen Liu, Andrew Owens, Hang Zhao

PDF

Open Access

TL;DR

This paper introduces a novel method for learning visual styles from unlabeled audio-visual data, enabling manipulation of image textures based on sound cues, outperforming label-based methods in various evaluations.

Contribution

It presents a new approach for audio-driven image stylization that learns to modify images to match sounds without requiring labeled data.

Findings

01

Sound-based model outperforms label-based approaches

02

Audio manipulation leads to predictable visual style changes

03

Method effectively learns visual textures from audio-visual pairs

Abstract

From the patter of rain to the crunch of snow, the sounds we hear often convey the visual textures that appear within a scene. In this paper, we present a method for learning visual styles from unlabeled audio-visual data. Our model learns to manipulate the texture of a scene to match a sound, a problem we term audio-driven image stylization. Given a dataset of paired audio-visual data, we learn to modify input images such that, after manipulation, they are more likely to co-occur with a given input sound. In quantitative and qualitative evaluations, our sound-based model outperforms label-based approaches. We also show that audio can be an intuitive representation for manipulating images, as adjusting a sound's volume or mixing two sounds together results in predictable changes to visual style. Project webpage: https://tinglok.netlify.app/files/avstyle

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Music Technology and Sound Studies · Animal Vocal Communication and Behavior