Quaternion Neural Networks for Multi-channel Distant Speech Recognition

Xinchi Qiu; Titouan Parcollet; Mirco Ravanelli; Nicholas Lane; Mohamed; Morchid

arXiv:2005.08566·eess.AS·May 20, 2020

Quaternion Neural Networks for Multi-channel Distant Speech Recognition

Xinchi Qiu, Titouan Parcollet, Mirco Ravanelli, Nicholas Lane, Mohamed, Morchid

PDF

1 Repo

TL;DR

This paper introduces quaternion neural networks for multi-channel distant speech recognition, effectively modeling inter-channel dependencies and improving recognition accuracy over traditional real-valued models.

Contribution

The paper presents a novel quaternion neural network architecture, specifically a quaternion LSTM, for processing multi-channel audio in distant speech recognition tasks.

Findings

01

Quaternion LSTM outperforms real-valued LSTM in recognition accuracy.

02

Quaternion algebra effectively models inter-channel dependencies.

03

The approach improves robustness in noisy, reverberant environments.

Abstract

Despite the significant progress in automatic speech recognition (ASR), distant ASR remains challenging due to noise and reverberation. A common approach to mitigate this issue consists of equipping the recording devices with multiple microphones that capture the acoustic scene from different perspectives. These multi-channel audio recordings contain specific internal relations between each signal. In this paper, we propose to capture these inter- and intra- structural dependencies with quaternion neural networks, which can jointly process multiple signals as whole quaternion entities. The quaternion algebra replaces the standard dot product with the Hamilton one, thus offering a simple and elegant way to model dependencies between elements. The quaternion layers are then coupled with a recurrent neural network, which can learn long-term dependencies in the time domain. We show that a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mravanelli/pytorch-kaldi
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory