Language Embedding Meets Dynamic Graph: A New Exploration for Neural Architecture Representation Learning
Haizhao Jing, Haokui Zhang, Zhenhao Shang, Rong Xiao, Peng Wang, Yanning Zhang

TL;DR
This paper introduces LeDG-Former, a novel framework combining language embeddings and dynamic graph learning to improve neural architecture representation, enabling zero-shot hardware prediction and surpassing state-of-the-art benchmarks.
Contribution
LeDG-Former uniquely integrates language-based semantic embeddings with dynamic graph transformers for neural architecture modeling, addressing hardware info and structural representation limitations.
Findings
Achieves state-of-the-art results on NNLQP benchmark.
First to enable cross-hardware latency prediction.
Outperforms previous methods on NAS-Bench datasets.
Abstract
Neural Architecture Representation Learning aims to transform network models into feature representations for predicting network attributes, playing a crucial role in deploying and designing networks for real-world applications. Recently, inspired by the success of transformers, transformer-based models integrated with Graph Neural Networks (GNNs) have achieved significant progress in representation learning. However, current methods still have some limitations. First, existing methods overlook hardware attribute information, which conflicts with the current trend of diversified deep learning hardware and limits the practical applicability of models. Second, current encoding approaches rely on static adjacency matrices to represent topological structures, failing to capture the structural differences between computational nodes, which ultimately compromises encoding effectiveness. In…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
1. The proposed framework effectively integrates neural architecture and hardware platform representations, advancing existing approaches. 2. The proposed dynamic graph self-attention mechanism reduces the computational cost of full attention while maintaining flexibility for specific local structures. 3. The experimental results appear strong and convincing.
1. Is the predefined template sufficiently informative for any neural architecture design? The manually defined template may be limited by the user’s knowledge. Could an automated approach be developed to extract more aligned concepts from both the neural architecture and the hardware platform? Furthermore, how are the hardware platform descriptions obtained? Their quality may significantly affect overall performance. 2. If the attended nodes are predefined by nearly two-hop parents, how can the
* The paper sets state-of-the-art for a couple of problems where the input is some neural network architecture. This has broad use-cases in "efficiency" [running faster] and in neural architecture search [finding more accurate networks, without going through the training process]. * The paper introduces a GNN architecture, which consists of combining known techniques * Paper combines LLMs with GNNs. This combination is an active area of research.
## Major weaknesses * The model is attention model on multiple adjacency matrices, i.e., a straightforward combination of (MixHop, NGCN, or alike) with graph transformer archiectures. * The word dynamic should be completely eliminated from everywhere: title, section headers, abstract, and main text. The paper does *not* deal with dynamic graphs. It deals with static graphs. They only combine different adjacency hops from a fixed graph (like MixHop, etc). Dynamic implies that the neural network
1. The paper addresses the insufficient information usage problem in designing neural architecture representation learning methods. It identifies two key limitations, which are the overlooked hardware information and the heterogenety of the nodes in the architecture graph, which are well motivated.
1. Although the language template representation part is novel in the NAS domain, it is still not a novel solution. In areas like graph foundation model or graph learning, related techniques that transform graphs into tokens by using LLMs have already been explored. 2. There lacks an in-depth investigation of the proposed problem. The positive correlation between the hardware information and the resulting latency is trivial to find. It is better to provide more insights than only showing the pe
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Graph Theory and Algorithms · Advanced Neural Network Applications
