Mimetic Initialization of MLPs

Asher Trockman; J. Zico Kolter

arXiv:2602.07156·cs.LG·February 10, 2026

Mimetic Initialization of MLPs

Asher Trockman, J. Zico Kolter

PDF

Open Access

TL;DR

This paper extends mimetic initialization techniques to multilayer perceptrons (MLPs), demonstrating that a simple mean-shift in the first layer can accelerate training on vision tasks, complementing existing spatial mixing initializations.

Contribution

It introduces the first application of mimetic initialization to channel mixing layers, specifically MLPs, and shows that a simple mean adjustment improves training speed.

Findings

01

Speed-ups in training on CIFAR-10 and ImageNet-1k

02

Complementary effects with spatial mixing initializations

03

Simple mean shift in first layer enhances MLP training

Abstract

Mimetic initialization uses pretrained models as case studies of good initialization, using observations of structures in trained weights to inspire new, simple initialization techniques. So far, it has been applied only to spatial mixing layers, such convolutional, self-attention, and state space layers. In this work, we present the first attempt to apply the method to channel mixing layers, namely multilayer perceptrons (MLPs). Our extremely simple technique for MLPs -- to give the first layer a nonzero mean -- speeds up training on small-scale vision tasks like CIFAR-10 and ImageNet-1k. Though its effect is much smaller than spatial mixing initializations, it can be used in conjunction with them for an additional positive effect.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Reservoir Computing · Generative Adversarial Networks and Image Synthesis · Adversarial Robustness in Machine Learning