Emergent Symbol-like Number Variables in Artificial Neural Networks

Satchel Grant; Noah D. Goodman; James L. McClelland

arXiv:2501.06141·cs.LG·August 19, 2025

Emergent Symbol-like Number Variables in Artificial Neural Networks

Satchel Grant, Noah D. Goodman, James L. McClelland

PDF

Open Access 3 Reviews

TL;DR

This paper investigates how neural networks develop interpretable, symbol-like number variables in their hidden states when trained on sequence tasks, revealing architecture-dependent differences and the importance of neural subspace analysis.

Contribution

It introduces methods to interpret neural network activity as symbolic algorithms using neural subspaces and extends alignment techniques to better understand neural representations.

Findings

01

Neural networks can develop graded, symbol-like number variables.

02

Alignment with symbolic algorithms varies by architecture and task.

03

Recurrent and transformer models learn fundamentally different solutions.

Abstract

What types of numeric representations emerge in neural systems, and what would a satisfying answer to this question look like? In this work, we interpret Neural Network (NN) solutions to sequence based number tasks using a variety of methods to understand how well we can interpret them through the lens of interpretable Symbolic Algorithms (SAs) -- precise programs describable by rules and typed, mutable variables. We use autoregressive GRUs, LSTMs, and Transformers trained on tasks where the correct tokens depend on numeric information only latent in the task structure. We show through multiple causal and theoretical methods that we can interpret raw NN activity through the lens of simplified SAs when we frame the activity in terms of neural subspaces rather than individual neurons. Using Distributed Alignment Search (DAS), we find that, depending on network architecture,…

Peer Reviews

Decision·Submitted to ICLR 2025

Reviewer 01Rating 3Confidence 4

Strengths

1. The paper furthers the body of work on interpreting how transformers complete mathematical or symbolic tasks 2. The paper is well-written, and the motivation is clearly conveyed

Weaknesses

Novelty concerns: it does not seem like this paper advances the Pareto frontier of interpretability in transformer models. The task studied is very simple and involves repeating a token the same number of times as it occurred in the prompt, and though the introduction makes an attempt to distinguish it as a special study of symbolism that has not been previously explored, similar analyses exist in a wide range of the literature, on much more complicated tasks than this one. 1. The literature con

Reviewer 02Rating 5Confidence 4

Strengths

- Novel application of DAS to study the symbolic counting behaviors of RNNs and Transformers - The experiments present interesting results to verify the hypothesis

Weaknesses

- The idea that symbol-like variables can emerge purely from next-token prediction objectives in neural networks is not surprising. Given the success of large language models (LLMs) trained with next-token prediction (NTP), numerous recent studies have empirically and theoretically validated NTP's effectiveness as a universal learning approach for many tasks including numerical reasoning [1,2]. - The paper merely highlights the alignment between the hypothesis program and the neural network rep

Reviewer 03Rating 6Confidence 2

Strengths

1. The motivation of the paper is reasonable, and the description is very clear, making the thought process easy to follow. It’s worth mentioning that although I am not familiar with the field of causal abstraction, reading the paper and reviewing the related work allowed me to gain a general understanding of the field and appreciate the contributions of this work, which is commendable. 2. The experimental design appears to be reasonable and thorough. It identifies different characteristics of

Weaknesses

1. Although the content of the paper is acceptable to me, I feel a bit disappointed that the paper only briefly mentions leaving more complex tasks and larger models for future research. I am unsure whether the current research approach has enough potential for further expansion. 2. As stated in Section 4.3, the reason why the Same-Object task causes poorer alignment in recurrent neural networks compared to both the Single-Object and Multi-Object tasks is unclear.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational Physics and Python Applications