Architecture-Aware Learning Curve Extrapolation via Graph Ordinary Differential Equation
Yanna Ding, Zijie Huang, Xiao Shou, Yihang Guo, Yizhou Sun, Jianxi Gao

TL;DR
This paper introduces an architecture-aware neural differential equation model that improves learning curve extrapolation by incorporating neural network architecture information, outperforming existing methods in predicting training performance.
Contribution
The paper proposes a novel neural differential equation model that integrates neural architecture information for more accurate learning curve extrapolation.
Findings
Outperforms state-of-the-art learning curve extrapolation methods.
Effectively captures fluctuating learning curves and quantifies uncertainty.
Enhances neural architecture search by improving training configuration ranking.
Abstract
Learning curve extrapolation predicts neural network performance from early training epochs and has been applied to accelerate AutoML, facilitating hyperparameter tuning and neural architecture search. However, existing methods typically model the evolution of learning curves in isolation, neglecting the impact of neural network (NN) architectures, which influence the loss landscape and learning trajectories. In this work, we explore whether incorporating neural network architecture improves learning curve modeling and how to effectively integrate this architectural information. Motivated by the dynamical system view of optimization, we propose a novel architecture-aware neural differential equation model to forecast learning curves continuously. We empirically demonstrate its ability to capture the general trend of fluctuating learning curves while quantifying uncertainty through…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Graph Neural Networks · Face and Expression Recognition · Machine Learning and Algorithms
