Pretraining a Neural Network before Knowing Its Architecture
Boris Knyazev

TL;DR
This paper investigates the effectiveness of Graph HyperNetwork (GHN) predictions for initializing large neural networks, especially on new architectures, and proposes post-processing techniques to enhance fine-tuning performance.
Contribution
It demonstrates the limitations of GHN predictions on recent architectures and introduces simple post-processing methods to improve fine-tuning outcomes.
Findings
GHN predictions are less effective for recent architectures like ConvNeXt.
Predicted parameters lack sufficient diversity for successful gradient-based fine-tuning.
Post-processing of predicted parameters improves fine-tuning performance on ResNet-50 and ConvNeXt.
Abstract
Training large neural networks is possible by training a smaller hypernetwork that predicts parameters for the large ones. A recently released Graph HyperNetwork (GHN) trained this way on one million smaller ImageNet architectures is able to predict parameters for large unseen networks such as ResNet-50. While networks with predicted parameters lose performance on the source task, the predicted parameters have been found useful for fine-tuning on other tasks. We study if fine-tuning based on the same GHN is still useful on novel strong architectures that were published after the GHN had been trained. We found that for recent architectures such as ConvNeXt, GHN initialization becomes less useful than for ResNet-50. One potential reason is the increased distribution shift of novel architectures from those used to train the GHN. We also found that the predicted parameters lack the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Brain Tumor Detection and Classification · Domain Adaptation and Few-Shot Learning
MethodsConvNeXt · HyperNetwork
