Points2Sound: From mono to binaural audio using 3D point cloud scenes

Francesc Llu\'is; Vasileios Chatziioannou; Alex Hofmann

arXiv:2104.12462·cs.SD·May 22, 2023

Points2Sound: From mono to binaural audio using 3D point cloud scenes

Francesc Llu\'is, Vasileios Chatziioannou, Alex Hofmann

PDF

Open Access 1 Repo

TL;DR

Points2Sound is a deep learning model that converts mono audio into binaural audio by leveraging 3D point cloud visual data, enhancing immersive virtual experiences.

Contribution

This work introduces a novel multi-modal deep learning approach that uses 3D point cloud scenes to guide binaural audio synthesis from mono signals, extending previous 2D visual guidance methods.

Findings

01

3D visual information effectively guides binaural synthesis.

02

Model performance varies with scene attributes and reverberation.

03

Multiple mono signals and source counts impact synthesis quality.

Abstract

For immersive applications, the generation of binaural sound that matches its visual counterpart is crucial to bring meaningful experiences to people in a virtual environment. Recent studies have shown the possibility of using neural networks for synthesizing binaural audio from mono audio by using 2D visual information as guidance. Extending this approach by guiding the audio with 3D visual information and operating in the waveform domain may allow for a more accurate auralization of a virtual audio scene. We propose Points2Sound, a multi-modal deep learning model which generates a binaural version from mono audio using 3D point cloud scenes. Specifically, Points2Sound consists of a vision network and an audio network. The vision network uses 3D sparse convolutions to extract a visual feature from the point cloud scene. Then, the visual feature conditions the audio network, which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

francesclluis/points2sound
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHearing Loss and Rehabilitation · Speech and Audio Processing · Acoustic Wave Phenomena Research

MethodsSparse Convolutions