Principled Weight Initialization for Hypernetworks

Oscar Chang; Lampros Flokas; Hod Lipson

arXiv:2312.08399·cs.LG·December 15, 2023·35 cites

Principled Weight Initialization for Hypernetworks

Oscar Chang, Lampros Flokas, Hod Lipson

PDF

Open Access

TL;DR

This paper introduces principled weight initialization methods for hypernetworks, addressing the scale mismatch problem and improving training stability, convergence speed, and overall performance.

Contribution

It proposes novel initialization techniques specifically designed for hypernetworks, which were previously not well-understood or addressed.

Findings

01

More stable mainnet weights during training

02

Lower training loss achieved with new initialization

03

Faster convergence compared to traditional methods

Abstract

Hypernetworks are meta neural networks that generate weights for a main neural network in an end-to-end differentiable manner. Despite extensive applications ranging from multi-task learning to Bayesian deep learning, the problem of optimizing hypernetworks has not been studied to date. We observe that classical weight initialization methods like Glorot & Bengio (2010) and He et al. (2015), when applied directly on a hypernet, fail to produce weights for the mainnet in the correct scale. We develop principled techniques for weight initialization in hypernets, and show that they lead to more stable mainnet weights, lower training loss, and faster convergence.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Advanced Neural Network Applications