Estimating Phoneme Class Conditional Probabilities from Raw Speech   Signal using Convolutional Neural Networks

Dimitri Palaz; Ronan Collobert; Mathew Magimai.-Doss

arXiv:1304.1018·cs.LG·June 13, 2013

Estimating Phoneme Class Conditional Probabilities from Raw Speech Signal using Convolutional Neural Networks

Dimitri Palaz, Ronan Collobert, Mathew Magimai.-Doss

PDF

TL;DR

This paper explores using convolutional neural networks to directly estimate phoneme class probabilities from raw speech signals, eliminating the need for traditional feature extraction, and demonstrates comparable or improved recognition performance.

Contribution

It introduces a novel CNN-based method for phoneme recognition directly from raw speech, bypassing traditional feature extraction steps.

Findings

01

CNNs can automatically learn relevant features from raw speech

02

The proposed approach achieves comparable or better accuracy than traditional methods

03

CNN-based models simplify the phoneme recognition pipeline

Abstract

In hybrid hidden Markov model/artificial neural networks (HMM/ANN) automatic speech recognition (ASR) system, the phoneme class conditional probabilities are estimated by first extracting acoustic features from the speech signal based on prior knowledge such as, speech perception or/and speech production knowledge, and, then modeling the acoustic features with an ANN. Recent advances in machine learning techniques, more specifically in the field of image processing and text processing, have shown that such divide and conquer strategy (i.e., separating feature extraction and modeling steps) may not be necessary. Motivated from these studies, in the framework of convolutional neural networks (CNNs), this paper investigates a novel approach, where the input to the ANN is raw speech signal and the output is phoneme class conditional probability estimates. On TIMIT phoneme recognition task,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.