Learning An Invariant Speech Representation

Georgios Evangelopoulos; Stephen Voinea; Chiyuan Zhang; Lorenzo; Rosasco; Tomaso Poggio

arXiv:1406.3884·cs.SD·June 17, 2014·2 cites

Learning An Invariant Speech Representation

Georgios Evangelopoulos, Stephen Voinea, Chiyuan Zhang, Lorenzo, Rosasco, Tomaso Poggio

PDF

Open Access

TL;DR

This paper introduces a new invariant speech representation method that improves phoneme classification accuracy and reduces sample complexity by learning features robust to transformations, inspired by visual domain theories.

Contribution

It extends a theory of invariant visual representations to speech, proposing a template-based, quasi-invariant feature extraction approach for small-sample speech recognition.

Findings

01

Improved vowel classification accuracy.

02

Reduced sample complexity compared to standard features.

03

Effective hierarchical architecture extension.

Abstract

Recognition of speech, and in particular the ability to generalize and learn from small sets of labelled examples like humans do, depends on an appropriate representation of the acoustic input. We formulate the problem of finding robust speech features for supervised learning with small sample complexity as a problem of learning representations of the signal that are maximally invariant to intraclass transformations and deformations. We propose an extension of a theory for unsupervised learning of invariant visual representations to the auditory domain and empirically evaluate its validity for voiced speech sound classification. Our version of the theory requires the memory-based, unsupervised storage of acoustic templates -- such as specific phones or words -- together with all the transformations of each that normally occur. A quasi-invariant representation for a speech segment can be…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis