InstaTune: Instantaneous Neural Architecture Search During Fine-Tuning
Sharath Nittur Sridhar, Souvik Kundu, Sairam Sundaresan, Maciej, Szankin, Anthony Sarah

TL;DR
InstaTune is a novel method that performs neural architecture search during the fine-tuning of large pre-trained models, reducing time and computational costs while optimizing sub-networks for specific tasks and hardware platforms.
Contribution
It introduces a plug-and-play approach that leverages pre-trained weights and multi-objective evolutionary search to find Pareto-optimal sub-networks during fine-tuning.
Findings
Outperforms baselines in accuracy and efficiency across various models.
Works effectively on both unimodal and multi-modal transformer architectures.
Reduces NAS time and resource consumption by integrating with fine-tuning.
Abstract
One-Shot Neural Architecture Search (NAS) algorithms often rely on training a hardware agnostic super-network for a domain specific task. Optimal sub-networks are then extracted from the trained super-network for different hardware platforms. However, training super-networks from scratch can be extremely time consuming and compute intensive especially for large models that rely on a two-stage training process of pre-training and fine-tuning. State of the art pre-trained models are available for a wide range of tasks, but their large sizes significantly limits their applicability on various hardware platforms. We propose InstaTune, a method that leverages off-the-shelf pre-trained weights for large models and generates a super-network during the fine-tuning stage. InstaTune has multiple benefits. Firstly, since the process happens during fine-tuning, it minimizes the overall time and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Machine Learning and Data Classification · Domain Adaptation and Few-Shot Learning
