PickNet: Real-Time Channel Selection for Ad Hoc Microphone Arrays

Takuya Yoshioka; Xiaofei Wang; and Dongmei Wang

arXiv:2201.09586·eess.AS·January 25, 2022

PickNet: Real-Time Channel Selection for Ad Hoc Microphone Arrays

Takuya Yoshioka, Xiaofei Wang, and Dongmei Wang

PDF

Open Access

TL;DR

PickNet is a neural network that selects the best microphone channel in real-time from a set of devices, improving speech quality and recognition accuracy in ad hoc microphone arrays.

Contribution

The paper introduces PickNet, a novel neural network model for real-time channel selection that is robust, computationally efficient, and suitable for ad hoc microphone arrays.

Findings

01

Significant reduction in word error rate in speech recognition tasks.

02

Improved signal-to-noise and direct-to-reverberation ratios.

03

Robust performance across varying acoustic conditions.

Abstract

This paper proposes PickNet, a neural network model for real-time channel selection for an ad hoc microphone array consisting of multiple recording devices like cell phones. Assuming at most one person to be vocally active at each time point, PickNet identifies the device that is spatially closest to the active person for each time frame by using a short spectral patch of just hundreds of milliseconds. The model is applied to every time frame, and the short time frame signals from the selected microphones are concatenated across the frames to produce an output signal. As the personal devices are usually held close to their owners, the output signal is expected to have higher signal-to-noise and direct-to-reverberation ratios on average than the input signals. Since PickNet utilizes only limited acoustic context at each time frame, the system using the proposed model works in real time…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis

MethodsHigh-Order Consensuses