Self-Supervised Generation of Spatial Audio for 360 Video
Pedro Morgado, Nuno Vasconcelos, Timothy Langlois, Oliver Wang

TL;DR
This paper presents a neural network-based method to generate spatial audio from mono recordings in 360 videos, enhancing immersive viewing without specialized microphones.
Contribution
It introduces an end-to-end trainable system that localizes sound sources on the viewing sphere using multi-modal analysis of video and audio, trained with self-supervision.
Findings
Successfully infers spatial sound source locations from mono audio and 360 video.
Creates datasets including in-the-wild YouTube videos with spatial audio.
Demonstrates improved spatial audio generation for immersive 360 video experiences.
Abstract
We introduce an approach to convert mono audio recorded by a 360 video camera into spatial audio, a representation of the distribution of sound over the full viewing sphere. Spatial audio is an important component of immersive 360 video viewing, but spatial audio microphones are still rare in current 360 video production. Our system consists of end-to-end trainable neural networks that separate individual sound sources and localize them on the viewing sphere, conditioned on multi-modal analysis of audio and 360 video frames. We introduce several datasets, including one filmed ourselves, and one collected in-the-wild from YouTube, consisting of 360 videos uploaded with spatial audio. During training, ground-truth spatial audio serves as self-supervision and a mixed down mono track forms the input to our network. Using our approach, we show that it is possible to infer the spatial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Music Technology and Sound Studies
