A Survey of Recent DNN Architectures on the TIMIT Phone Recognition Task

Josef Michalek; Jan Vanek

arXiv:1806.07974·cs.CL·June 22, 2018

A Survey of Recent DNN Architectures on the TIMIT Phone Recognition Task

Josef Michalek, Jan Vanek

PDF

1 Repo

TL;DR

This survey evaluates recent deep neural network architectures on the TIMIT phone recognition task, providing a baseline with open-source scripts and achieving lower error rates than previous reports.

Contribution

It offers a comprehensive comparison of recent DNN architectures on TIMIT, establishing a new baseline with open-source code and improved performance.

Findings

01

Achieved the lowest phone error rate (PER) to date on TIMIT.

02

Provided open-source scripts for replicating baseline results.

03

Compared various DNN architectures and identified top performers.

Abstract

In this survey paper, we have evaluated several recent deep neural network (DNN) architectures on a TIMIT phone recognition task. We chose the TIMIT corpus due to its popularity and broad availability in the community. It also simulates a low-resource scenario that is helpful in minor languages. Also, we prefer the phone recognition task because it is much more sensitive to an acoustic model quality than a large vocabulary continuous speech recognition (LVCSR) task. In recent years, many DNN published papers reported results on TIMIT. However, the reported phone error rates (PERs) were often much higher than a PER of a simple feed-forward (FF) DNN. That was the main motivation of this paper: To provide a baseline DNNs with open-source scripts to easily replicate the baseline results for future papers with lowest possible PERs. According to our knowledge, the best-achieved PER of this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

OrcusCZ/NNAcousticModeling
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.