Alternative structures for character-level RNNs

Piotr Bojanowski; Armand Joulin; Tomas Mikolov

arXiv:1511.06303·cs.LG·November 25, 2015·39 cites

Alternative structures for character-level RNNs

Piotr Bojanowski, Armand Joulin, Tomas Mikolov

PDF

Open Access 1 Repo

TL;DR

This paper proposes two structural modifications to character-level RNNs to better model long-term dependencies and reduce computational costs, evaluated on multilingual real-world data.

Contribution

It introduces two novel structural modifications to classical RNNs for character-level modeling, improving efficiency and dependency modeling.

Findings

01

Improved modeling of long-term dependencies.

02

Reduced computational costs.

03

Effective on multilingual datasets.

Abstract

Recurrent neural networks are convenient and efficient models for language modeling. However, when applied on the level of characters instead of words, they suffer from several problems. In order to successfully model long-term dependencies, the hidden representation needs to be large. This in turn implies higher computational costs, which can become prohibitive in practice. We propose two alternative structural modifications to the classical RNN model. The first one consists on conditioning the character level representation on the previous word representation. The other one uses the character history to condition the output probability. We evaluate the performance of the two proposed modifications on challenging, multi-lingual real world data.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cloudmcloudyo/capstone
tf

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis