Towards Scalable and Versatile Weight Space Learning
Konstantin Sch\"urholt, Michael W. Mahoney, Damian Borth

TL;DR
This paper presents SANE, a scalable, task-agnostic method for learning representations of neural network weights that can handle larger models and generate unseen networks, advancing understanding and transferability.
Contribution
SANE extends hyper-representations to sequentially embed neural network weights, enabling scalable, task-agnostic, and generative weight space learning for larger models.
Findings
SANE matches or exceeds state-of-the-art on weight representation benchmarks.
It effectively initializes models for new tasks.
SANE can generate unseen neural network models.
Abstract
Learning representations of well-trained neural network models holds the promise to provide an understanding of the inner workings of those models. However, previous work has either faced limitations when processing larger networks or was task-specific to either discriminative or generative tasks. This paper introduces the SANE approach to weight-space learning. SANE overcomes previous limitations by learning task-agnostic representations of neural networks that are scalable to larger models of varying architectures and that show capabilities beyond a single task. Our method extends the idea of hyper-representations towards sequential processing of subsets of neural network weights, thus allowing one to embed larger neural networks as a set of tokens into the learned representation space. SANE reveals global model information from layer-wise embeddings, and it can sequentially generate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition
MethodsSparse Evolutionary Training · Average Pooling · Global Average Pooling · Convolution · Max Pooling · Kaiming Initialization
