Reverse engineering recurrent networks for sentiment classification   reveals line attractor dynamics

Niru Maheswaranathan; Alex Williams; Matthew D. Golub; Surya Ganguli,; David Sussillo

arXiv:1906.10720·cs.LG·December 6, 2019·26 cites

Reverse engineering recurrent networks for sentiment classification reveals line attractor dynamics

Niru Maheswaranathan, Alex Williams, Matthew D. Golub, Surya Ganguli,, David Sussillo

PDF

Open Access

TL;DR

This paper uses dynamical systems analysis to reverse engineer trained recurrent neural networks for sentiment classification, revealing that they operate via low-dimensional line attractor dynamics that are interpretable and consistent across architectures.

Contribution

It introduces a method to analyze RNNs through fixed points and linearization, uncovering a universal line attractor mechanism underlying sentiment classification.

Findings

01

Trained RNNs converge to low-dimensional, interpretable fixed points.

02

A line attractor dynamics explains how RNNs perform sentiment analysis.

03

The identified mechanism is consistent across different RNN architectures and datasets.

Abstract

Recurrent neural networks (RNNs) are a widely used tool for modeling sequential data, yet they are often treated as inscrutable black boxes. Given a trained recurrent network, we would like to reverse engineer it--to obtain a quantitative, interpretable description of how it solves a particular task. Even for simple tasks, a detailed understanding of how recurrent networks work, or a prescription for how to develop such an understanding, remains elusive. In this work, we use tools from dynamical systems analysis to reverse engineer recurrent networks trained to perform sentiment classification, a foundational natural language processing task. Given a trained network, we find fixed points of the recurrent dynamics and linearize the nonlinear system around these fixed points. Despite their theoretical capacity to implement complex, high-dimensional computations, we find that trained…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Machine Learning in Materials Science · Neural Networks and Applications