End-to-end Phoneme Sequence Recognition using Convolutional Neural   Networks

Dimitri Palaz; Ronan Collobert; Mathew Magimai.-Doss

arXiv:1312.2137·cs.LG·December 10, 2013·39 cites

End-to-end Phoneme Sequence Recognition using Convolutional Neural Networks

Dimitri Palaz, Ronan Collobert, Mathew Magimai.-Doss

PDF

Open Access 1 Repo

TL;DR

This paper demonstrates that convolutional neural networks can directly learn phoneme sequences from raw speech signals, achieving comparable performance to traditional MFCC-based systems, thus reducing reliance on hand-crafted features.

Contribution

It introduces an end-to-end CNN approach for raw speech phoneme recognition, challenging the necessity of complex feature extraction.

Findings

01

Comparable performance on TIMIT and WSJ datasets

02

CNN can learn directly from raw signals

03

Reduces need for hand-crafted features

Abstract

Most phoneme recognition state-of-the-art systems rely on a classical neural network classifiers, fed with highly tuned features, such as MFCC or PLP features. Recent advances in ``deep learning'' approaches questioned such systems, but while some attempts were made with simpler features such as spectrograms, state-of-the-art systems still rely on MFCCs. This might be viewed as a kind of failure from deep learning approaches, which are often claimed to have the ability to train with raw signals, alleviating the need of hand-crafted features. In this paper, we investigate a convolutional neural network approach for raw speech signals. While convolutional architectures got tremendous success in computer vision or text processing, they seem to have been let down in the past recent years in the speech processing field. We show that it is possible to learn an end-to-end phoneme sequence…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

iSkaCh/PhonemeRecog-Without-MFCC-
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Natural Language Processing Techniques