Towards Empirical Interpretation of Internal Circuits and Properties in Grokked Transformers on Modular Polynomials
Hiroki Furuta, Gouki Minegishi, Yusuke Iwasawa, Yutaka Matsuo

TL;DR
This paper investigates the internal representations and circuits of Transformers during grokking on various modular polynomial tasks, revealing how Fourier analysis explains their generalization and transferability.
Contribution
It introduces Fourier-based analysis and new measures to interpret internal circuits in grokked Transformers across different modular operations, including polynomials.
Findings
Fourier analysis characterizes internal representations for modular operations.
Transferability of grokking is limited to specific operation pairs.
Multi-task learning can lead to co-grokking and faster generalization.
Abstract
Grokking has been actively explored to reveal the mystery of delayed generalization and identifying interpretable representations and algorithms inside the grokked models is a suggestive hint to understanding its mechanism. Grokking on modular addition has been known to implement Fourier representation and its calculation circuits with trigonometric identities in Transformers. Considering the periodicity in modular arithmetic, the natural question is to what extent these explanations and interpretations hold for the grokking on other modular operations beyond addition. For a closer look, we first hypothesize that any modular operations can be characterized with distinctive Fourier representation or internal circuits, grokked models obtain common features transferable among similar operations, and mixing datasets with similar operations promotes grokking. Then, we extensively examine…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputability, Logic, AI Algorithms
MethodsHierarchical Information Threading
