Pretraining a Neural Network before Knowing Its Architecture

Boris Knyazev

arXiv:2207.10049·cs.CV·July 21, 2022

Pretraining a Neural Network before Knowing Its Architecture

Boris Knyazev

PDF

Open Access 2 Repos

TL;DR

This paper investigates the effectiveness of Graph HyperNetwork (GHN) predictions for initializing large neural networks, especially on new architectures, and proposes post-processing techniques to enhance fine-tuning performance.

Contribution

It demonstrates the limitations of GHN predictions on recent architectures and introduces simple post-processing methods to improve fine-tuning outcomes.

Findings

01

GHN predictions are less effective for recent architectures like ConvNeXt.

02

Predicted parameters lack sufficient diversity for successful gradient-based fine-tuning.

03

Post-processing of predicted parameters improves fine-tuning performance on ResNet-50 and ConvNeXt.

Abstract

Training large neural networks is possible by training a smaller hypernetwork that predicts parameters for the large ones. A recently released Graph HyperNetwork (GHN) trained this way on one million smaller ImageNet architectures is able to predict parameters for large unseen networks such as ResNet-50. While networks with predicted parameters lose performance on the source task, the predicted parameters have been found useful for fine-tuning on other tasks. We study if fine-tuning based on the same GHN is still useful on novel strong architectures that were published after the GHN had been trained. We found that for recent architectures such as ConvNeXt, GHN initialization becomes less useful than for ResNet-50. One potential reason is the increased distribution shift of novel architectures from those used to train the GHN. We also found that the predicted parameters lack the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Brain Tumor Detection and Classification · Domain Adaptation and Few-Shot Learning

MethodsConvNeXt · HyperNetwork