BinauralFlow: A Causal and Streamable Approach for High-Quality Binaural Speech Synthesis with Flow Matching Models
Susan Liang, Dejan Markovic, Israel D. Gebru, Steven Krenn, Todd Keebler, Jacob Sandakly, Frank Yu, Samuel Hassel, Chenliang Xu, Alexander Richard

TL;DR
BinauralFlow is a novel flow matching-based framework for high-quality, streamable binaural speech synthesis that models binaural cues and room acoustics, achieving near-indistinguishability from real recordings.
Contribution
It introduces a causal, streamable flow matching model with a specialized architecture and inference pipeline for superior binaural audio synthesis.
Findings
Outperforms state-of-the-art methods in quality metrics
Achieves a 42% confusion rate in perceptual tests
Enables real-time streaming binaural audio synthesis
Abstract
Binaural rendering aims to synthesize binaural audio that mimics natural hearing based on a mono audio and the locations of the speaker and listener. Although many methods have been proposed to solve this problem, they struggle with rendering quality and streamable inference. Synthesizing high-quality binaural audio that is indistinguishable from real-world recordings requires precise modeling of binaural cues, room reverb, and ambient sounds. Additionally, real-world applications demand streaming inference. To address these challenges, we propose a flow matching based streaming binaural speech synthesis framework called BinauralFlow. We consider binaural rendering to be a generation problem rather than a regression problem and design a conditional flow matching model to render high-quality audio. Moreover, we design a causal U-Net architecture that estimates the current audio frame…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
MethodsConvolution · *Communicated@Fast*How Do I Communicate to Expedia? · Concatenated Skip Connection · Max Pooling · U-Net
