Teasing Apart Architecture and Initial Weights as Sources of Inductive   Bias in Neural Networks

Gianluca Bencomo; Max Gupta; Ioana Marinescu; R. Thomas McCoy; Thomas; L. Griffiths

arXiv:2502.20237·cs.LG·February 28, 2025

Teasing Apart Architecture and Initial Weights as Sources of Inductive Bias in Neural Networks

Gianluca Bencomo, Max Gupta, Ioana Marinescu, R. Thomas McCoy, Thomas, L. Griffiths

PDF

Open Access

TL;DR

This study investigates how initial weights influence neural network inductive biases, revealing that initial weights can significantly impact performance and may reduce the importance of architecture choices, especially with meta-learning.

Contribution

The paper demonstrates that initial weights are a crucial source of inductive bias and can be optimized via meta-learning to mitigate architecture differences in neural networks.

Findings

01

Meta-learning reduces performance disparities across architectures.

02

Initial weights significantly influence model generalization.

03

All architectures struggle with out-of-distribution problems.

Abstract

Artificial neural networks can acquire many aspects of human knowledge from data, making them promising as models of human learning. But what those networks can learn depends upon their inductive biases -- the factors other than the data that influence the solutions they discover -- and the inductive biases of neural networks remain poorly understood, limiting our ability to draw conclusions about human learning from the performance of these systems. Cognitive scientists and machine learning researchers often focus on the architecture of a neural network as a source of inductive bias. In this paper we explore the impact of another source of inductive bias -- the initial weights of the network -- using meta-learning as a tool for finding initial weights that are adapted for specific problems. We evaluate four widely-used architectures -- MLPs, CNNs, LSTMs, and Transformers -- by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Machine Learning and Data Classification · Domain Adaptation and Few-Shot Learning

MethodsFocus