LoGAH: Predicting 774-Million-Parameter Transformers using Graph   HyperNetworks with 1/100 Parameters

Xinyu Zhou; Boris Knyazev; Alexia Jolicoeur-Martineau; Jie Fu

arXiv:2405.16287·cs.LG·May 28, 2024

LoGAH: Predicting 774-Million-Parameter Transformers using Graph HyperNetworks with 1/100 Parameters

Xinyu Zhou, Boris Knyazev, Alexia Jolicoeur-Martineau, Jie Fu

PDF

Open Access 1 Repo

TL;DR

LoGAH introduces a low-rank hypernetwork approach that efficiently predicts parameters for extremely large neural networks, enabling better initialization and transfer learning in vision and language models.

Contribution

LoGAH presents a memory-efficient method to predict parameters of 774-million-parameter networks using low-rank decoders, improving over previous hypernetwork approaches.

Findings

01

LoGAH outperforms random initialization and existing hypernetworks in vision and language tasks.

02

LoGAH enables parameter prediction for 774-million-parameter models with reduced memory.

03

Promising transfer learning results from small datasets to larger tasks.

Abstract

A good initialization of deep learning models is essential since it can help them converge better and faster. However, pretraining large models is unaffordable for many researchers, which makes a desired prediction for initial parameters more necessary nowadays. Graph HyperNetworks (GHNs), one approach to predicting model parameters, have recently shown strong performance in initializing large vision models. Unfortunately, predicting parameters of very wide networks relies on copying small chunks of parameters multiple times and requires an extremely large number of parameters to support full prediction, which greatly hinders its adoption in practice. To address this limitation, we propose LoGAH (Low-rank GrAph Hypernetworks), a GHN with a low-rank parameter decoder that expands to significantly wider networks without requiring as excessive increase of parameters as in previous…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

blackzxy/logah
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Materials Science · Graph Theory and Algorithms · Advanced Graph Neural Networks