Differentiable Architecture Pruning for Transfer Learning
Nicolo Colombo, Yang Gao

TL;DR
This paper introduces a gradient-based method for extracting transferable, low-complexity neural network architectures from large models, enabling effective transfer learning with limited data and providing theoretical guarantees.
Contribution
It presents a novel gradient-based architecture pruning approach that disentangles architecture from weights, suitable for transfer learning and backed by convergence guarantees.
Findings
Effective transfer learning with minimal data.
The method produces architectures that can be retrained successfully.
Theoretical convergence guarantees are provided.
Abstract
We propose a new gradient-based approach for extracting sub-architectures from a given large model. Contrarily to existing pruning methods, which are unable to disentangle the network architecture and the corresponding weights, our architecture-pruning scheme produces transferable new structures that can be successfully retrained to solve different tasks. We focus on a transfer-learning setup where architectures can be trained on a large data set but very few data points are available for fine-tuning them on new tasks. We define a new gradient-based algorithm that trains architectures of arbitrarily low complexity independently from the attached weights. Given a search space defined by an existing large neural model, we reformulate the architecture search task as a complexity-penalized subset-selection problem and solve it through a two-temperature relaxation scheme. We provide…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Human Pose and Action Recognition · Machine Learning and ELM
MethodsPruning
