AudioViewer: Learning to Visualize Sounds

Chunjin Song; Yuchi Zhang; Willis Peng; Parmis Mohaghegh; Bastian; Wandt; and Helge Rhodin

arXiv:2012.13341·cs.HC·February 15, 2023

AudioViewer: Learning to Visualize Sounds

Chunjin Song, Yuchi Zhang, Willis Peng, Parmis Mohaghegh, Bastian, Wandt, and Helge Rhodin

PDF

Open Access 1 Repo 1 Video

TL;DR

AudioViewer introduces an unsupervised method to translate audio into visual representations, enabling deaf and hard of hearing individuals to perceive sounds through videos of faces and numbers, effectively capturing high-dimensional audio features.

Contribution

It presents a novel unpaired audio-to-video translation model that learns from unlabelled data and disentangles speech content from style, advancing sensory substitution techniques.

Findings

01

The approach preserves key audio features in visualizations.

02

Videos of faces and numbers effectively represent high-dimensional audio.

03

Human studies confirm the interpretability of generated videos.

Abstract

A long-standing goal in the field of sensory substitution is to enable sound perception for deaf and hard of hearing (DHH) people by visualizing audio content. Different from existing models that translate to hand sign language, between speech and text, or text and images, we target immediate and low-level audio to video translation that applies to generic environment sounds as well as human speech. Since such a substitution is artificial, without labels for supervised learning, our core contribution is to build a mapping from audio to video that learns from unpaired examples via high-level constraints. For speech, we additionally disentangle content from style, such as gender and dialect. Qualitative and quantitative results, including a human study, demonstrate that our unpaired translation approach maintains important audio features in the generated video and that videos of faces and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ChunjinSong/audioviewer_code
pytorchOfficial

Videos

AudioViewer: Learning to Visualize Sounds· youtube

Taxonomy

TopicsSpeech and Audio Processing · Subtitles and Audiovisual Media · Music and Audio Processing