Sequence Transduction with Recurrent Neural Networks
Alex Graves

TL;DR
This paper presents an end-to-end RNN-based sequence transduction system capable of transforming input sequences into output sequences without pre-defined alignments, demonstrated on phoneme recognition tasks.
Contribution
It introduces a novel probabilistic RNN-based transduction model that eliminates the need for pre-specified input-output alignments.
Findings
Successful phoneme recognition on TIMIT corpus
End-to-end training without explicit alignment
Flexible transformation of input to output sequences
Abstract
Many machine learning tasks can be expressed as the transformation---or \emph{transduction}---of input sequences into output sequences: speech recognition, machine translation, protein secondary structure prediction and text-to-speech to name but a few. One of the key challenges in sequence transduction is learning to represent both the input and output sequences in a way that is invariant to sequential distortions such as shrinking, stretching and translating. Recurrent neural networks (RNNs) are a powerful sequence learning architecture that has proven capable of learning such representations. However RNNs traditionally require a pre-defined alignment between the input and output sequences to perform transduction. This is a severe limitation since \emph{finding} the alignment is the most difficult aspect of many sequence transduction problems. Indeed, even determining the length of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Topic Modeling
