TL;DR
DGPO introduces a reinforcement learning-guided graph diffusion method tailored for directed acyclic graphs, significantly improving neural architecture search by learning transferable structural priors and achieving near-optimal results.
Contribution
The paper extends graph diffusion models to directed acyclic graphs using topological encoding, enabling effective RL fine-tuning for neural architecture generation.
Findings
DGPO matches benchmark optima on NAS-Bench-201 tasks.
Pretrained on 7% of search space, DGPO generates near-oracle architectures.
Inverse optimization confirms reward-driven steering.
Abstract
Reinforcement learning fine-tuning has proven effective for steering generative diffusion models toward desired properties in image and molecular domains. Graph diffusion models have similarly been applied to combinatorial structure generation, including neural architecture search (NAS). However, neural architectures are directed acyclic graphs (DAGs) where edge direction encodes functional semantics such as data flow-information that existing graph diffusion methods, designed for undirected structures, discard. We propose Directed Graph Policy Optimization (DGPO), which extends reinforcement learning fine-tuning of discrete graph diffusion models to DAGs via topological node ordering and positional encoding. Validated on NAS-Bench-101 and NAS-Bench-201, DGPO matches the benchmark optimum on all three NAS-Bench-201 tasks (91.61%, 73.49%, 46.77%). The central finding is that the model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
