BatVision with GCC-PHAT Features for Better Sound to Vision Predictions
Jesper Haahr Christensen, Sascha Hornauer, Stella Yu

TL;DR
This paper enhances a sound-to-vision generative model inspired by bat echolocation by introducing GCC-PHAT features, residual learning, and spectral normalization, resulting in improved depth and grayscale predictions from binaural sounds.
Contribution
The paper introduces GCC-PHAT features and residual learning into BatVision, significantly improving depth estimation and perceptual quality over previous models.
Findings
Improved depth and grayscale estimation accuracy.
Enhanced perceptual quality of generated images.
Quantitative and qualitative performance gains over prior models.
Abstract
Inspired by sophisticated echolocation abilities found in nature, we train a generative adversarial network to predict plausible depth maps and grayscale layouts from sound. To achieve this, our sound-to-vision model processes binaural echo-returns from chirping sounds. We build upon previous work with BatVision that consists of a sound-to-vision model and a self-collected dataset using our mobile robot and low-cost hardware. We improve on the previous model by introducing several changes to the model, which leads to a better depth and grayscale estimation, and increased perceptual quality. Rather than using raw binaural waveforms as input, we generate generalized cross-correlation (GCC) features and use these as input instead. In addition, we change the model generator and base it on residual learning and use spectral normalization in the discriminator. We compare and present both…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Advanced Vision and Imaging · Music and Audio Processing
MethodsSpectral Normalization
