TL;DR
This survey evaluates recent deep neural network architectures on the TIMIT phone recognition task, providing a baseline with open-source scripts and achieving lower error rates than previous reports.
Contribution
It offers a comprehensive comparison of recent DNN architectures on TIMIT, establishing a new baseline with open-source code and improved performance.
Findings
Achieved the lowest phone error rate (PER) to date on TIMIT.
Provided open-source scripts for replicating baseline results.
Compared various DNN architectures and identified top performers.
Abstract
In this survey paper, we have evaluated several recent deep neural network (DNN) architectures on a TIMIT phone recognition task. We chose the TIMIT corpus due to its popularity and broad availability in the community. It also simulates a low-resource scenario that is helpful in minor languages. Also, we prefer the phone recognition task because it is much more sensitive to an acoustic model quality than a large vocabulary continuous speech recognition (LVCSR) task. In recent years, many DNN published papers reported results on TIMIT. However, the reported phone error rates (PERs) were often much higher than a PER of a simple feed-forward (FF) DNN. That was the main motivation of this paper: To provide a baseline DNNs with open-source scripts to easily replicate the baseline results for future papers with lowest possible PERs. According to our knowledge, the best-achieved PER of this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
