Quaternion Convolutional Neural Networks for End-to-End Automatic Speech   Recognition

Titouan Parcollet; Ying Zhang; Mohamed Morchid; Chiheb Trabelsi,; Georges Linar\`es; Renato De Mori; Yoshua Bengio

arXiv:1806.07789·cs.SD·June 21, 2018

Quaternion Convolutional Neural Networks for End-to-End Automatic Speech Recognition

Titouan Parcollet, Ying Zhang, Mohamed Morchid, Chiheb Trabelsi,, Georges Linar\`es, Renato De Mori, Yoshua Bengio

PDF

1 Repo

TL;DR

This paper introduces quaternion-valued convolutional neural networks for end-to-end speech recognition, leveraging quaternion algebra to process multidimensional features more effectively than traditional real-valued models.

Contribution

It proposes integrating multiple feature views into quaternion CNNs for sequence-to-sequence speech recognition with CTC, demonstrating improved performance with fewer parameters.

Findings

01

Lower phoneme error rate (PER) on TIMIT corpus

02

Fewer learning parameters needed compared to real-valued CNNs

03

Effective processing of multidimensional speech features

Abstract

Recently, the connectionist temporal classification (CTC) model coupled with recurrent (RNN) or convolutional neural networks (CNN), made it easier to train speech recognition systems in an end-to-end fashion. However in real-valued models, time frame components such as mel-filter-bank energies and the cepstral coefficients obtained from them, together with their first and second order derivatives, are processed as individual elements, while a natural alternative is to process such components as composed entities. We propose to group such elements in the form of quaternions and to process these quaternions using the established quaternion algebra. Quaternion numbers and quaternion neural networks have shown their efficiency to process multidimensional inputs as entities, to encode internal dependencies, and to solve many tasks with less learning parameters than real-valued models. This…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Riccardo-Vecchi/Pytorch-Quaternion-Neural-Networks
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.