Sequence Transduction with Recurrent Neural Networks

Alex Graves

arXiv:1211.3711·cs.NE·November 16, 2012·1.3k cites

Sequence Transduction with Recurrent Neural Networks

Alex Graves

PDF

Open Access 5 Repos 1 Models

TL;DR

This paper presents an end-to-end RNN-based sequence transduction system capable of transforming input sequences into output sequences without pre-defined alignments, demonstrated on phoneme recognition tasks.

Contribution

It introduces a novel probabilistic RNN-based transduction model that eliminates the need for pre-specified input-output alignments.

Findings

01

Successful phoneme recognition on TIMIT corpus

02

End-to-end training without explicit alignment

03

Flexible transformation of input to output sequences

Abstract

Many machine learning tasks can be expressed as the transformation---or \emph{transduction}---of input sequences into output sequences: speech recognition, machine translation, protein secondary structure prediction and text-to-speech to name but a few. One of the key challenges in sequence transduction is learning to represent both the input and output sequences in a way that is invariant to sequential distortions such as shrinking, stretching and translating. Recurrent neural networks (RNNs) are a powerful sequence learning architecture that has proven capable of learning such representations. However RNNs traditionally require a pre-defined alignment between the input and output sequences to perform transduction. This is a severe limitation since \emph{finding} the alignment is the most difficult aspect of many sequence transduction problems. Indeed, even determining the length of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
niobures/GigaAM
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Topic Modeling