PickNet: Real-Time Channel Selection for Ad Hoc Microphone Arrays
Takuya Yoshioka, Xiaofei Wang, and Dongmei Wang

TL;DR
PickNet is a neural network that selects the best microphone channel in real-time from a set of devices, improving speech quality and recognition accuracy in ad hoc microphone arrays.
Contribution
The paper introduces PickNet, a novel neural network model for real-time channel selection that is robust, computationally efficient, and suitable for ad hoc microphone arrays.
Findings
Significant reduction in word error rate in speech recognition tasks.
Improved signal-to-noise and direct-to-reverberation ratios.
Robust performance across varying acoustic conditions.
Abstract
This paper proposes PickNet, a neural network model for real-time channel selection for an ad hoc microphone array consisting of multiple recording devices like cell phones. Assuming at most one person to be vocally active at each time point, PickNet identifies the device that is spatially closest to the active person for each time frame by using a short spectral patch of just hundreds of milliseconds. The model is applied to every time frame, and the short time frame signals from the selected microphones are concatenated across the frames to produce an output signal. As the personal devices are usually held close to their owners, the output signal is expected to have higher signal-to-noise and direct-to-reverberation ratios on average than the input signals. Since PickNet utilizes only limited acoustic context at each time frame, the system using the proposed model works in real time…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis
MethodsHigh-Order Consensuses
