Emergent Symbol-like Number Variables in Artificial Neural Networks
Satchel Grant, Noah D. Goodman, James L. McClelland

TL;DR
This paper investigates how neural networks develop interpretable, symbol-like number variables in their hidden states when trained on sequence tasks, revealing architecture-dependent differences and the importance of neural subspace analysis.
Contribution
It introduces methods to interpret neural network activity as symbolic algorithms using neural subspaces and extends alignment techniques to better understand neural representations.
Findings
Neural networks can develop graded, symbol-like number variables.
Alignment with symbolic algorithms varies by architecture and task.
Recurrent and transformer models learn fundamentally different solutions.
Abstract
What types of numeric representations emerge in neural systems, and what would a satisfying answer to this question look like? In this work, we interpret Neural Network (NN) solutions to sequence based number tasks using a variety of methods to understand how well we can interpret them through the lens of interpretable Symbolic Algorithms (SAs) -- precise programs describable by rules and typed, mutable variables. We use autoregressive GRUs, LSTMs, and Transformers trained on tasks where the correct tokens depend on numeric information only latent in the task structure. We show through multiple causal and theoretical methods that we can interpret raw NN activity through the lens of simplified SAs when we frame the activity in terms of neural subspaces rather than individual neurons. Using Distributed Alignment Search (DAS), we find that, depending on network architecture,…
Peer Reviews
Decision·Submitted to ICLR 2025
1. The paper furthers the body of work on interpreting how transformers complete mathematical or symbolic tasks 2. The paper is well-written, and the motivation is clearly conveyed
Novelty concerns: it does not seem like this paper advances the Pareto frontier of interpretability in transformer models. The task studied is very simple and involves repeating a token the same number of times as it occurred in the prompt, and though the introduction makes an attempt to distinguish it as a special study of symbolism that has not been previously explored, similar analyses exist in a wide range of the literature, on much more complicated tasks than this one. 1. The literature con
- Novel application of DAS to study the symbolic counting behaviors of RNNs and Transformers - The experiments present interesting results to verify the hypothesis
- The idea that symbol-like variables can emerge purely from next-token prediction objectives in neural networks is not surprising. Given the success of large language models (LLMs) trained with next-token prediction (NTP), numerous recent studies have empirically and theoretically validated NTP's effectiveness as a universal learning approach for many tasks including numerical reasoning [1,2]. - The paper merely highlights the alignment between the hypothesis program and the neural network rep
1. The motivation of the paper is reasonable, and the description is very clear, making the thought process easy to follow. It’s worth mentioning that although I am not familiar with the field of causal abstraction, reading the paper and reviewing the related work allowed me to gain a general understanding of the field and appreciate the contributions of this work, which is commendable. 2. The experimental design appears to be reasonable and thorough. It identifies different characteristics of
1. Although the content of the paper is acceptable to me, I feel a bit disappointed that the paper only briefly mentions leaving more complex tasks and larger models for future research. I am unsure whether the current research approach has enough potential for further expansion. 2. As stated in Section 4.3, the reason why the Same-Object task causes poorer alignment in recurrent neural networks compared to both the Single-Object and Multi-Object tasks is unclear.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Physics and Python Applications
