Convolutional Attention-based Seq2Seq Neural Network for End-to-End ASR
Dan Lim

TL;DR
This paper presents a convolutional attention-based sequence-to-sequence neural network with advanced techniques like batch normalization and residual connections, achieving competitive phoneme error rates for end-to-end automatic speech recognition.
Contribution
It introduces a novel convolutional attention-based seq2seq model incorporating modern neural network algorithms for improved speech recognition performance.
Findings
Achieved 15.8% phoneme error rate on TIMIT dataset
Demonstrated effectiveness of the proposed model for end-to-end ASR
Integrated multiple neural network techniques to enhance accuracy
Abstract
This thesis introduces the sequence to sequence model with Luong's attention mechanism for end-to-end ASR. It also describes various neural network algorithms including Batch normalization, Dropout and Residual network which constitute the convolutional attention-based seq2seq neural network. Finally the proposed model proved its effectiveness for speech recognition achieving 15.8% phoneme error rate on TIMIT dataset.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Geophysical Methods and Applications · Fault Detection and Control Systems
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory · Sequence to Sequence · Dropout
