Controlling Grokking with Nonlinearity and Data Symmetry

Ahmed Salah; David Yevick

arXiv:2411.05353·cs.LG·November 11, 2024

Controlling Grokking with Nonlinearity and Data Symmetry

Ahmed Salah, David Yevick

PDF

Open Access

TL;DR

This paper explores how adjusting nonlinearity and data symmetry in neural networks influences grokking behavior, enabling control over generalization and revealing patterns useful for factoring composite moduli.

Contribution

It introduces methods to control grokking through activation functions and network architecture, and links weight entropy and nonlinearity to generalization and data symmetry.

Findings

01

Increasing nonlinearity leads to more uniform PCA weight patterns.

02

Patterns in weight projections can be used to factor nonprime P.

03

Weight entropy correlates with the network's generalization ability.

Abstract

This paper demonstrates that grokking behavior in modular arithmetic with a modulus P in a neural network can be controlled by modifying the profile of the activation function as well as the depth and width of the model. Plotting the even PCA projections of the weights of the last NN layer against their odd projections further yields patterns which become significantly more uniform when the nonlinearity is increased by incrementing the number of layers. These patterns can be employed to factor P when P is nonprime. Finally, a metric for the generalization ability of the network is inferred from the entropy of the layer weights while the degree of nonlinearity is related to correlations between the local entropy of the weights of the neurons in the final layer.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications

MethodsPrincipal Components Analysis