A Comparison of Adaptation Techniques and Recurrent Neural Network   Architectures

Jan Vanek; Josef Michalek; Jan Zelinka; Josef Psutka

arXiv:1807.06441·eess.AS·July 18, 2018·1 cites

A Comparison of Adaptation Techniques and Recurrent Neural Network Architectures

Jan Vanek, Josef Michalek, Jan Zelinka, Josef Psutka

PDF

Open Access 1 Repo

TL;DR

This paper compares various neural network architectures and adaptation techniques for acoustic modeling in speech recognition, highlighting the effectiveness of certain methods with RNNs on the TIMIT dataset.

Contribution

It provides a comprehensive comparison of five neural network architectures and multiple adaptation and normalization techniques for RNN-based speech recognition.

Findings

01

Not all adaptation techniques effective for feed-forward NNs work with RNNs.

02

Certain adaptation and normalization methods improved RNN performance on TIMIT.

03

Open-source scripts enable easy replication and further research.

Abstract

Recently, recurrent neural networks have become state-of-the-art in acoustic modeling for automatic speech recognition. The long short-term memory (LSTM) units are the most popular ones. However, alternative units like gated recurrent unit (GRU) and its modifications outperformed LSTM in some publications. In this paper, we compared five neural network (NN) architectures with various adaptation and feature normalization techniques. We have evaluated feature-space maximum likelihood linear regression, five variants of i-vector adaptation and two variants of cepstral mean normalization. The most adaptation and normalization techniques were developed for feed-forward NNs and, according to results in this paper, not all of them worked also with RNNs. For experiments, we have chosen a well known and available TIMIT phone recognition task. The phone recognition is much more sensitive to the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

OrcusCZ/NNAcousticModeling
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing

MethodsAttention Model · Sigmoid Activation · Tanh Activation · Long Short-Term Memory