Grokking Finite-Dimensional Algebra
Pascal Jr Tikeng Notsawo, Guillaume Dumas, Guillaume Rabusseau

TL;DR
This paper explores the grokking phenomenon in neural networks learning finite-dimensional algebra multiplication, revealing how algebraic properties influence generalization and the dynamics of learning.
Contribution
It extends grokking analysis from group operations to general algebras, connecting learning dynamics to algebraic structure and tensor properties.
Findings
Grokking is influenced by algebraic properties like commutativity and associativity.
Structural tensor features such as sparsity and rank affect generalization.
Latent embeddings aligned with algebraic representations correlate with successful generalization.
Abstract
This paper investigates the grokking phenomenon, which refers to the sudden transition from a long memorization to generalization observed during neural networks training, in the context of learning multiplication in finite-dimensional algebras (FDA). While prior work on grokking has focused mainly on group operations, we extend the analysis to more general algebraic structures, including non-associative, non-commutative, and non-unital algebras. We show that learning group operations is a special case of learning FDA, and that learning multiplication in FDA amounts to learning a bilinear product specified by the algebra's structure tensor. For algebras over the reals, we connect the learning problem to matrix factorization with an implicit low-rank bias, and for algebras over finite fields, we show that grokking emerges naturally as models must learn discrete representations of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
