Towards Empirical Interpretation of Internal Circuits and Properties in   Grokked Transformers on Modular Polynomials

Hiroki Furuta; Gouki Minegishi; Yusuke Iwasawa; Yutaka Matsuo

arXiv:2402.16726·cs.LG·December 31, 2024·1 cites

Towards Empirical Interpretation of Internal Circuits and Properties in Grokked Transformers on Modular Polynomials

Hiroki Furuta, Gouki Minegishi, Yusuke Iwasawa, Yutaka Matsuo

PDF

Open Access 1 Repo

TL;DR

This paper investigates the internal representations and circuits of Transformers during grokking on various modular polynomial tasks, revealing how Fourier analysis explains their generalization and transferability.

Contribution

It introduces Fourier-based analysis and new measures to interpret internal circuits in grokked Transformers across different modular operations, including polynomials.

Findings

01

Fourier analysis characterizes internal representations for modular operations.

02

Transferability of grokking is limited to specific operation pairs.

03

Multi-task learning can lead to co-grokking and faster generalization.

Abstract

Grokking has been actively explored to reveal the mystery of delayed generalization and identifying interpretable representations and algorithms inside the grokked models is a suggestive hint to understanding its mechanism. Grokking on modular addition has been known to implement Fourier representation and its calculation circuits with trigonometric identities in Transformers. Considering the periodicity in modular arithmetic, the natural question is to what extent these explanations and interpretations hold for the grokking on other modular operations beyond addition. For a closer look, we first hypothesize that any modular operations can be characterized with distinctive Fourier representation or internal circuits, grokked models obtain common features transferable among similar operations, and mixing datasets with similar operations promotes grokking. Then, we extensively examine…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

frt03/grok_mod_poly
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputability, Logic, AI Algorithms

MethodsHierarchical Information Threading