Graph Metanetworks for Processing Diverse Neural Architectures
Derek Lim, Haggai Maron, Marc T. Law, Jonathan Lorraine, James Lucas

TL;DR
This paper introduces Graph Metanetworks, a novel approach using graph neural networks to process and analyze diverse neural network architectures by representing their weights as graphs, enabling better generalization and symmetry handling.
Contribution
The paper presents Graph Metanetworks, a new method that generalizes to various neural architectures and handles symmetries, overcoming limitations of previous specialized approaches.
Findings
GMNs are expressive and permutation-equivariant.
Effective across multiple neural network architectures.
Outperform existing methods on metanetwork tasks.
Abstract
Neural networks efficiently encode learned information within their parameters. Consequently, many tasks can be unified by treating neural networks themselves as input data. When doing so, recent studies demonstrated the importance of accounting for the symmetries and geometry of parameter spaces. However, those works developed architectures tailored to specific networks such as MLPs and CNNs without normalization layers, and generalizing such architectures to other types of networks can be challenging. In this work, we overcome these challenges by building new metanetworks - neural networks that take weights from other neural networks as input. Put simply, we carefully build graphs representing the input neural networks and process the graphs using graph neural networks. Our approach, Graph Metanetworks (GMNs), generalizes to neural architectures where competing methods struggle, such…
Peer Reviews
Decision·ICLR 2024 spotlight
It should be noted that I am not familiar with other papers regarding metanets and am basing my assessment largely on information from this paper. That being said, given that they represent the current state of this area fairly, this seems to be an impressive paper. They appear to address scaling issues with similarly expressive network representations and expand the variety of representable networks, which seem to be great contributions.
They address most of my concerns I had while reading, including some tests to compare against the newer state of the art metanets mentioned that were excluded from most of the result due to not being able to represent certain types of layers. Notably a cited competing method ‘NFN’ was left out from this, though seems to be addressed in the appendix, and as it apparently deals exclusively with MLPs I believe it is fair not to compare with the proposed method.
The proposed GMNs can generalise to different types of neural architectures. Unlike previous works e.g. Navon et al. (2023) that were tailored to specific networks, the GMNs can handle a wide range of architectures, including those with complex modules such as attention blocks. Upon the main idea of processing weights with graph networks in Zhang et al. (2023), this paper extends the method to a variety of neural layers that are common in modern neural architectures. The authors also provide a
- Although different neural layers and architectures can be encoded as the proposed parameter graph representation, however, the information in the spatial domain is missing (e.g. translation equivariance and receptive field in ConvNets). This could be an inherent and general limitation of the proposed method as well as other related work on weight domain. - There are also a variety of non-parametric operations in feed-forward neural networks that are not addressed by the paper. E.g. pooling la
The paper has several good contributions: 1. The paper addresses an interesting and promising topic of representing neural networks and their weights. 2. The proposed parameter graph is a reasonable approach and looks much simpler than previous works such as DWSNets and NFN/NFT. It is similar to the concurrent work of Zhang et al. (2023), which is properly credited. 3. Description of how to build graphs for different layer types such as convolution, self-attention and residual connections is v
The paper have several weaknesses. I'm willing to revise the rating based on authors' response. 1. The paper says that "While our graphs are DAGs, we are free to use undirected edges". Would not the direction of edges be a useful feature in some cases? For example, sometimes networks take multiple inputs and have multiple outputs so there is no way to differentiate input vs output unless edge direction is used. 2. The computational complexity vs other approaches is not analyzed. Can the model
Videos
Taxonomy
TopicsNeural Networks and Applications · Machine Learning in Materials Science
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Residual Connection · Convolution · Average Pooling · 1x1 Convolution · Global Average Pooling · Residual Block · Batch Normalization · Max Pooling · Kaiming Initialization
