Understanding Simplicity Bias towards Compositional Mappings via   Learning Dynamics

Yi Ren; Danica J. Sutherland

arXiv:2409.09626·cs.LG·September 17, 2024

Understanding Simplicity Bias towards Compositional Mappings via Learning Dynamics

Yi Ren, Danica J. Sutherland

PDF

Open Access 1 Repo

TL;DR

This paper investigates why neural networks tend to learn simple, compositional mappings, revealing that such mappings are the simplest bijections and that the bias towards simplicity is intrinsic to gradient descent training, aiding generalization.

Contribution

The study demonstrates that compositional mappings are the simplest bijections and that neural network training inherently favors learning these simple mappings, explaining their generalization capabilities.

Findings

01

Compositional mappings are the simplest bijections based on coding length.

02

Simplicity bias is an intrinsic property of neural network training via gradient descent.

03

Models trained properly tend to spontaneously generalize well due to this bias.

Abstract

Obtaining compositional mappings is important for the model to generalize well compositionally. To better understand when and how to encourage the model to learn such mappings, we study their uniqueness through different perspectives. Specifically, we first show that the compositional mappings are the simplest bijections through the lens of coding length (i.e., an upper bound of their Kolmogorov complexity). This property explains why models having such mappings can generalize well. We further show that the simplicity bias is usually an intrinsic property of neural network training via gradient descent. That partially explains why some models spontaneously generalize well when they are trained appropriately.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

joshua-ren/simplicity_bias_learning_dynamics
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGeochemistry and Geologic Mapping · Crystallization and Solubility Studies · Process Optimization and Integration