LoGAH: Predicting 774-Million-Parameter Transformers using Graph HyperNetworks with 1/100 Parameters
Xinyu Zhou, Boris Knyazev, Alexia Jolicoeur-Martineau, Jie Fu

TL;DR
LoGAH introduces a low-rank hypernetwork approach that efficiently predicts parameters for extremely large neural networks, enabling better initialization and transfer learning in vision and language models.
Contribution
LoGAH presents a memory-efficient method to predict parameters of 774-million-parameter networks using low-rank decoders, improving over previous hypernetwork approaches.
Findings
LoGAH outperforms random initialization and existing hypernetworks in vision and language tasks.
LoGAH enables parameter prediction for 774-million-parameter models with reduced memory.
Promising transfer learning results from small datasets to larger tasks.
Abstract
A good initialization of deep learning models is essential since it can help them converge better and faster. However, pretraining large models is unaffordable for many researchers, which makes a desired prediction for initial parameters more necessary nowadays. Graph HyperNetworks (GHNs), one approach to predicting model parameters, have recently shown strong performance in initializing large vision models. Unfortunately, predicting parameters of very wide networks relies on copying small chunks of parameters multiple times and requires an extremely large number of parameters to support full prediction, which greatly hinders its adoption in practice. To address this limitation, we propose LoGAH (Low-rank GrAph Hypernetworks), a GHN with a low-rank parameter decoder that expands to significantly wider networks without requiring as excessive increase of parameters as in previous…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Graph Theory and Algorithms · Advanced Graph Neural Networks
