NAR-Former V2: Rethinking Transformer for Universal Neural Network Representation Learning
Yun Yi, Haokui Zhang, Rong Xiao, Nannan Wang, Xiaoyu Wang

TL;DR
This paper introduces NAR-Former V2, a Transformer-based model that effectively learns neural network representations for both cell-structured and entire networks, outperforming GNN-based methods in latency estimation and matching state-of-the-art accuracy predictions.
Contribution
The paper proposes a novel Transformer-based model that incorporates GNN capabilities for universal neural network representation learning, improving generalization and performance over existing methods.
Findings
Surpasses GNN-based NNLP in latency estimation on NNLQP dataset.
Achieves comparable accuracy prediction results on NASBench101 and NASBench201 datasets.
Enhances Transformer with graph encoding and inductive learning for better generalization.
Abstract
As more deep learning models are being applied in real-world applications, there is a growing need for modeling and learning the representations of neural networks themselves. An efficient representation can be used to predict target attributes of networks without the need for actual training and deployment procedures, facilitating efficient network deployment and design. Recently, inspired by the success of Transformer, some Transformer-based representation learning frameworks have been proposed and achieved promising performance in handling cell-structured models. However, graph neural network (GNN) based approaches still dominate the field of learning representation for the entire network. In this paper, we revisit Transformer and compare it with GNN to analyse their different architecture characteristics. We then propose a modified Transformer-based universal neural network…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Graph Neural Networks · Brain Tumor Detection and Classification · Machine Learning and ELM
MethodsMulti-Head Attention · Attention Is All You Need · Dense Connections · Dropout · Byte Pair Encoding · Softmax · Layer Normalization · Position-Wise Feed-Forward Layer · Linear Layer · Absolute Position Encodings
