Sound-to-Imagination: An Exploratory Study on Unsupervised Crossmodal   Translation Using Diverse Audiovisual Data

Leonardo A. Fanzeres; Climent Nadeu

arXiv:2106.01266·cs.SD·March 10, 2022·1 cites

Sound-to-Imagination: An Exploratory Study on Unsupervised Crossmodal Translation Using Diverse Audiovisual Data

Leonardo A. Fanzeres, Climent Nadeu

PDF

Open Access 1 Repo

TL;DR

This study explores unsupervised sound-to-image translation using diverse audiovisual data, employing GANs and informativity classifiers to generate semantically coherent images from unknown sounds, achieving over 14% interpretability.

Contribution

It introduces an unsupervised approach for sound-to-image translation with diverse data, utilizing GANs and classifiers for evaluation, advancing beyond simplified prior methods.

Findings

01

Achieved over 14% interpretable, semantically coherent images from unknown sounds.

02

Demonstrated a trade-off between informativity and pixel space convergence.

03

Generalized the model to handle diverse, complex audiovisual data.

Abstract

The motivation of our research is to explore the possibilities of automatic sound-to-image (S2I) translation for enabling a human receiver to visually infer the occurrence of sound related events. We expect the computer to 'imagine' the scene from the captured sound, generating original images that picture the sound emitting source. Previous studies on similar topics opted for simplified approaches using data with low content diversity and/or sound class supervision. Differently, we propose to perform unsupervised S2I translation using thousands of distinct and unknown scenes, with slightly pre-cleaned data, just enough to guarantee aural-visual semantic coherence. To that end, we employ conditional generative adversarial networks (GANs) with a deep densely connected generator. Additionally, we present a solution using informativity classifiers to perform quantitative evaluation of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

leofanzeres/s2i
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Generative Adversarial Networks and Image Synthesis · Speech and Audio Processing