AV-NeRF: Learning Neural Fields for Real-World Audio-Visual Scene   Synthesis

Susan Liang; Chao Huang; Yapeng Tian; Anurag Kumar; Chenliang Xu

arXiv:2302.02088·cs.CV·October 17, 2023·6 cites

AV-NeRF: Learning Neural Fields for Real-World Audio-Visual Scene Synthesis

Susan Liang, Chao Huang, Yapeng Tian, Anurag Kumar, Chenliang Xu

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces AV-NeRF, a novel neural field approach for synthesizing realistic, spatially consistent audio-visual scenes from new viewpoints and positions, leveraging geometry-aware audio generation and a new dataset.

Contribution

The paper presents a first-of-its-kind NeRF-based method for real-world audio-visual scene synthesis, integrating acoustic propagation and source-centric modeling.

Findings

01

Effective synthesis of novel view and position videos with matching spatial audio.

02

Successful application on real-world and simulated datasets.

03

Improved realism and spatial consistency in audio-visual scene generation.

Abstract

Can machines recording an audio-visual scene produce realistic, matching audio-visual experiences at novel positions and novel view directions? We answer it by studying a new task -- real-world audio-visual scene synthesis -- and a first-of-its-kind NeRF-based approach for multimodal learning. Concretely, given a video recording of an audio-visual scene, the task is to synthesize new videos with spatial audios along arbitrary novel camera trajectories in that scene. We propose an acoustic-aware audio generation module that integrates prior knowledge of audio propagation into NeRF, in which we implicitly associate audio generation with the 3D geometry and material properties of a visual environment. Furthermore, we present a coordinate transformation module that expresses a view direction relative to the sound source, enabling the model to learn sound source-centric acoustic fields. To…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aluo-x/learning_neural_acoustic_fields
pytorch

Videos

AV-NeRF: Learning Neural Fields for Real-World Audio-Visual Scene Synthesis· slideslive

Taxonomy

TopicsMusic and Audio Processing · Generative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging