On the Mirage of Long-Range Dependency, with an Application to Integer Multiplication
Zichao Wei

TL;DR
This paper challenges the belief that neural networks struggle with long-range dependencies in integer multiplication, showing that the perceived difficulty is due to the computational representation rather than an intrinsic property.
Contribution
The authors formalize the concept of a computational 'mirage' and demonstrate that re-representing multiplication as a local operation enables neural networks to generalize without long-range dependency issues.
Findings
Neural cellular automaton achieves perfect generalization with only 321 parameters.
Transformers and other architectures fail under the same representation.
Long-range dependency is not intrinsic but depends on the computational spacetime representation.
Abstract
Integer multiplication has long been considered a hard problem for neural networks, with the difficulty widely attributed to the O(n) long-range dependency induced by carry chains. We argue that this diagnosis is wrong: long-range dependency is not an intrinsic property of multiplication, but a mirage produced by the choice of computational spacetime. We formalize the notion of mirage and provide a constructive proof: when two n-bit binary integers are laid out as a 2D outer-product grid, every step of long multiplication collapses into a local neighborhood operation. Under this representation, a neural cellular automaton with only 321 learnable parameters achieves perfect length generalization up to the training range. Five alternative architectures -- including Transformer (6,625 params), Transformer+RoPE, and Mamba -- all fail under the same representation.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
