Mimetic Initialization of MLPs
Asher Trockman, J. Zico Kolter

TL;DR
This paper extends mimetic initialization techniques to multilayer perceptrons (MLPs), demonstrating that a simple mean-shift in the first layer can accelerate training on vision tasks, complementing existing spatial mixing initializations.
Contribution
It introduces the first application of mimetic initialization to channel mixing layers, specifically MLPs, and shows that a simple mean adjustment improves training speed.
Findings
Speed-ups in training on CIFAR-10 and ImageNet-1k
Complementary effects with spatial mixing initializations
Simple mean shift in first layer enhances MLP training
Abstract
Mimetic initialization uses pretrained models as case studies of good initialization, using observations of structures in trained weights to inspire new, simple initialization techniques. So far, it has been applied only to spatial mixing layers, such convolutional, self-attention, and state space layers. In this work, we present the first attempt to apply the method to channel mixing layers, namely multilayer perceptrons (MLPs). Our extremely simple technique for MLPs -- to give the first layer a nonzero mean -- speeds up training on small-scale vision tasks like CIFAR-10 and ImageNet-1k. Though its effect is much smaller than spatial mixing initializations, it can be used in conjunction with them for an additional positive effect.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Reservoir Computing · Generative Adversarial Networks and Image Synthesis · Adversarial Robustness in Machine Learning
