TL;DR
This paper introduces quaternion neural networks for multi-channel distant speech recognition, effectively modeling inter-channel dependencies and improving recognition accuracy over traditional real-valued models.
Contribution
The paper presents a novel quaternion neural network architecture, specifically a quaternion LSTM, for processing multi-channel audio in distant speech recognition tasks.
Findings
Quaternion LSTM outperforms real-valued LSTM in recognition accuracy.
Quaternion algebra effectively models inter-channel dependencies.
The approach improves robustness in noisy, reverberant environments.
Abstract
Despite the significant progress in automatic speech recognition (ASR), distant ASR remains challenging due to noise and reverberation. A common approach to mitigate this issue consists of equipping the recording devices with multiple microphones that capture the acoustic scene from different perspectives. These multi-channel audio recordings contain specific internal relations between each signal. In this paper, we propose to capture these inter- and intra- structural dependencies with quaternion neural networks, which can jointly process multiple signals as whole quaternion entities. The quaternion algebra replaces the standard dot product with the Hamilton one, thus offering a simple and elegant way to model dependencies between elements. The quaternion layers are then coupled with a recurrent neural network, which can learn long-term dependencies in the time domain. We show that a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory
