# Towards Generating Ambisonics Using Audio-Visual Cue for Virtual Reality

**Authors:** Aakanksha Rana, Cagri Ozcinar, Aljoscha Smolic

arXiv: 1908.06752 · 2019-08-20

## TL;DR

This paper introduces a novel deep learning pipeline that generates Ambisonic surround sound for 360-degree videos using audio-visual cues, supported by a new dataset and evaluation criteria, advancing immersive VR experiences.

## Contribution

It presents a new dataset of 265 videos with sound-source annotations and a deep learning pipeline for automatic Ambisonic sound estimation from 360-degree audio-visual data.

## Key findings

- The pipeline accurately estimates 3D sound-source locations.
- It effectively encodes sound sources into Ambisonics format.
- The evaluation criteria provide benchmarks for future research.

## Abstract

Ambisonics i.e., a full-sphere surround sound, is quintessential with 360-degree visual content to provide a realistic virtual reality (VR) experience. While 360-degree visual content capture gained a tremendous boost recently, the estimation of corresponding spatial sound is still challenging due to the required sound-field microphones or information about the sound-source locations. In this paper, we introduce a novel problem of generating Ambisonics in 360-degree videos using the audio-visual cue. With this aim, firstly, a novel 360-degree audio-visual video dataset of 265 videos is introduced with annotated sound-source locations. Secondly, a pipeline is designed for an automatic Ambisonic estimation problem. Benefiting from the deep learning-based audio-visual feature-embedding and prediction modules, our pipeline estimates the 3D sound-source locations and further use such locations to encode to the B-format. To benchmark our dataset and pipeline, we additionally propose evaluation criteria to investigate the performance using different 360-degree input representations. Our results demonstrate the efficacy of the proposed pipeline and open up a new area of research in 360-degree audio-visual analysis for future investigations.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1908.06752/full.md

## Figures

26 figures with captions in the complete paper: https://tomesphere.com/paper/1908.06752/full.md

## References

24 references — full list in the complete paper: https://tomesphere.com/paper/1908.06752/full.md

---
Source: https://tomesphere.com/paper/1908.06752