Towards Scalable and Versatile Weight Space Learning

Konstantin Sch\"urholt; Michael W. Mahoney; Damian Borth

arXiv:2406.09997·cs.LG·June 17, 2024

Towards Scalable and Versatile Weight Space Learning

Konstantin Sch\"urholt, Michael W. Mahoney, Damian Borth

PDF

Open Access 1 Repo

TL;DR

This paper presents SANE, a scalable, task-agnostic method for learning representations of neural network weights that can handle larger models and generate unseen networks, advancing understanding and transferability.

Contribution

SANE extends hyper-representations to sequentially embed neural network weights, enabling scalable, task-agnostic, and generative weight space learning for larger models.

Findings

01

SANE matches or exceeds state-of-the-art on weight representation benchmarks.

02

It effectively initializes models for new tasks.

03

SANE can generate unseen neural network models.

Abstract

Learning representations of well-trained neural network models holds the promise to provide an understanding of the inner workings of those models. However, previous work has either faced limitations when processing larger networks or was task-specific to either discriminative or generative tasks. This paper introduces the SANE approach to weight-space learning. SANE overcomes previous limitations by learning task-agnostic representations of neural networks that are scalable to larger models of varying architectures and that show capabilities beyond a single task. Our method extends the idea of hyper-representations towards sequential processing of subsets of neural network weights, thus allowing one to embed larger neural networks as a set of tokens into the learned representation space. SANE reveals global model information from layer-wise embeddings, and it can sequentially generate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hsg-aiml/sane
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition

MethodsSparse Evolutionary Training · Average Pooling · Global Average Pooling · Convolution · Max Pooling · Kaiming Initialization