Neural Network-based Virtual Microphone Estimator
Tsubasa Ochiai, Marc Delcroix, Tomohiro Nakatani, Rintaro Ikeshita,, Keisuke Kinoshita, Shoko Araki

TL;DR
This paper introduces a neural network-based method for estimating virtual microphone signals directly from real multi-channel recordings, improving speech enhancement and recognition without relying on physical model assumptions.
Contribution
It proposes a fully supervised neural network approach for virtual microphone estimation that works directly on real recordings, bypassing traditional physical model limitations.
Findings
High estimation accuracy on real recordings
Improved speech enhancement performance
Enhanced speech recognition results
Abstract
Developing microphone array technologies for a small number of microphones is important due to the constraints of many devices. One direction to address this situation consists of virtually augmenting the number of microphone signals, e.g., based on several physical model assumptions. However, such assumptions are not necessarily met in realistic conditions. In this paper, as an alternative approach, we propose a neural network-based virtual microphone estimator (NN-VME). The NN-VME estimates virtual microphone signals directly in the time domain, by utilizing the precise estimation capability of the recent time-domain neural networks. We adopt a fully supervised learning framework that uses actual observations at the locations of the virtual microphones at training time. Consequently, the NN-VME can be trained using only multi-channel observations and thus directly on real recordings,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Advanced Adaptive Filtering Techniques
