Recurrent Neural Networks Learn to Store and Generate Sequences using   Non-Linear Representations

R\'obert Csord\'as; Christopher Potts; Christopher D. Manning; Atticus; Geiger

arXiv:2408.10920·cs.LG·August 21, 2024

Recurrent Neural Networks Learn to Store and Generate Sequences using Non-Linear Representations

R\'obert Csord\'as, Christopher Potts, Christopher D. Manning, Atticus, Geiger

PDF

Open Access 1 Repo

TL;DR

This paper challenges the Linear Representation Hypothesis by showing that RNNs use non-linear, magnitude-based representations for sequence storage, especially in smaller models, highlighting the complexity of neural encoding.

Contribution

The study provides a counterexample to the strong LRH, demonstrating that RNNs can learn non-linear, magnitude-based representations rather than solely linear directions.

Findings

01

Small RNNs encode tokens by magnitude rather than direction

02

Larger RNNs develop linear, direction-based representations

03

Interpretability should consider non-linear encoding mechanisms

Abstract

The Linear Representation Hypothesis (LRH) states that neural networks learn to encode concepts as directions in activation space, and a strong version of the LRH states that models learn only such encodings. In this paper, we present a counterexample to this strong LRH: when trained to repeat an input token sequence, gated recurrent neural networks (RNNs) learn to represent the token at each position with a particular order of magnitude, rather than a direction. These representations have layered features that are impossible to locate in distinct linear subspaces. To show this, we train interventions to predict and manipulate tokens by learning the scaling factor corresponding to each sequence position. These interventions indicate that the smallest RNNs find only this magnitude-based solution, while larger RNNs have linear representations. These findings strongly indicate that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

robertcsordas/onion_representations
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Processing and 3D Reconstruction · Neural Networks and Applications · Handwritten Text Recognition Techniques