Visualizing and Understanding Recurrent Networks

Andrej Karpathy; Justin Johnson; Li Fei-Fei

arXiv:1506.02078·cs.LG·November 18, 2015·890 cites

Visualizing and Understanding Recurrent Networks

Andrej Karpathy, Justin Johnson, Li Fei-Fei

PDF

Open Access 3 Repos

TL;DR

This paper analyzes how LSTM-based recurrent neural networks process sequential data, revealing interpretable internal mechanisms and identifying their strengths and limitations in capturing long-range dependencies.

Contribution

It provides the first detailed analysis of LSTM representations and error types using character-level language models, highlighting interpretable cells and long-range dependency tracking.

Findings

01

Identified interpretable cells tracking long-range dependencies

02

Compared LSTM performance with n-gram models to trace improvements

03

Analyzed residual errors to suggest future research directions

Abstract

Recurrent Neural Networks (RNNs), and specifically a variant with Long Short-Term Memory (LSTM), are enjoying renewed interest as a result of successful applications in a wide range of machine learning problems that involve sequential data. However, while LSTMs provide exceptional results in practice, the source of their performance and their limitations remain rather poorly understood. Using character-level language models as an interpretable testbed, we aim to bridge this gap by providing an analysis of their representations, predictions and error types. In particular, our experiments reveal the existence of interpretable cells that keep track of long-range dependencies such as line lengths, quotes and brackets. Moreover, our comparative analysis with finite horizon n-gram models traces the source of the LSTM improvements to long-range structural dependencies. Finally, we provide…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications