BinauralGrad: A Two-Stage Conditional Diffusion Probabilistic Model for   Binaural Audio Synthesis

Yichong Leng; Zehua Chen; Junliang Guo; Haohe Liu; Jiawei Chen; Xu; Tan; Danilo Mandic; Lei He; Xiang-Yang Li; Tao Qin; Sheng Zhao; Tie-Yan Liu

arXiv:2205.14807·eess.AS·November 30, 2022·29 cites

BinauralGrad: A Two-Stage Conditional Diffusion Probabilistic Model for Binaural Audio Synthesis

Yichong Leng, Zehua Chen, Junliang Guo, Haohe Liu, Jiawei Chen, Xu, Tan, Danilo Mandic, Lei He, Xiang-Yang Li, Tao Qin, Sheng Zhao, Tie-Yan Liu

PDF

Open Access 1 Repo 1 Video

TL;DR

BinauralGrad introduces a novel two-stage diffusion model framework that effectively synthesizes high-fidelity binaural audio from mono recordings by decomposing the process into common and specific components.

Contribution

The paper proposes a new two-stage diffusion-based approach for binaural audio synthesis, improving accuracy and fidelity over existing methods.

Findings

01

Outperforms baselines in objective and subjective metrics

02

Generates high-quality binaural audio from mono inputs

03

Achieves significant improvements in benchmark evaluations

Abstract

Binaural audio plays a significant role in constructing immersive augmented and virtual realities. As it is expensive to record binaural audio from the real world, synthesizing them from mono audio has attracted increasing attention. This synthesis process involves not only the basic physical warping of the mono audio, but also room reverberations and head/ear related filtrations, which, however, are difficult to accurately simulate in traditional digital signal processing. In this paper, we formulate the synthesis process from a different perspective by decomposing the binaural audio into a common part that shared by the left and right channels as well as a specific part that differs in each channel. Accordingly, we propose BinauralGrad, a novel two-stage framework equipped with diffusion models to synthesize them respectively. Specifically, in the first stage, the common information…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

microsoft/NeuralSpeech
pytorchOfficial

Videos

BinauralGrad: A Two-Stage Conditional Diffusion Probabilistic Model for Binaural Audio Synthesis· slideslive

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Hearing Loss and Rehabilitation

MethodsDiffusion