Reverse engineering recurrent networks for sentiment classification reveals line attractor dynamics
Niru Maheswaranathan, Alex Williams, Matthew D. Golub, Surya Ganguli,, David Sussillo

TL;DR
This paper uses dynamical systems analysis to reverse engineer trained recurrent neural networks for sentiment classification, revealing that they operate via low-dimensional line attractor dynamics that are interpretable and consistent across architectures.
Contribution
It introduces a method to analyze RNNs through fixed points and linearization, uncovering a universal line attractor mechanism underlying sentiment classification.
Findings
Trained RNNs converge to low-dimensional, interpretable fixed points.
A line attractor dynamics explains how RNNs perform sentiment analysis.
The identified mechanism is consistent across different RNN architectures and datasets.
Abstract
Recurrent neural networks (RNNs) are a widely used tool for modeling sequential data, yet they are often treated as inscrutable black boxes. Given a trained recurrent network, we would like to reverse engineer it--to obtain a quantitative, interpretable description of how it solves a particular task. Even for simple tasks, a detailed understanding of how recurrent networks work, or a prescription for how to develop such an understanding, remains elusive. In this work, we use tools from dynamical systems analysis to reverse engineer recurrent networks trained to perform sentiment classification, a foundational natural language processing task. Given a trained network, we find fixed points of the recurrent dynamics and linearize the nonlinear system around these fixed points. Despite their theoretical capacity to implement complex, high-dimensional computations, we find that trained…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Machine Learning in Materials Science · Neural Networks and Applications
