K for the Price of 1: Parameter-efficient Multi-task and Transfer   Learning

Pramod Kaushik Mudrakarta; Mark Sandler; Andrey Zhmoginov; Andrew; Howard

arXiv:1810.10703·cs.LG·February 26, 2019·30 cites

K for the Price of 1: Parameter-efficient Multi-task and Transfer Learning

Pramod Kaushik Mudrakarta, Mark Sandler, Andrey Zhmoginov, Andrew, Howard

PDF

Open Access

TL;DR

This paper presents a parameter-efficient method for transfer and multi-task learning by learning small model patches, enabling effective adaptation with minimal additional parameters, and matching or surpassing traditional fine-tuning results.

Contribution

It introduces a novel approach that uses small parameter sets to adapt pretrained networks for multiple tasks, reducing the need for full fine-tuning and improving transfer learning efficiency.

Findings

01

Reusing 98% of parameters in SSD for new tasks

02

Learning scales and biases suffices for effective transfer

03

Achieves comparable performance to full fine-tuning with fewer parameters

Abstract

We introduce a novel method that enables parameter-efficient transfer and multi-task learning with deep neural networks. The basic approach is to learn a model patch - a small set of parameters - that will specialize to each task, instead of fine-tuning the last layer or the entire network. For instance, we show that learning a set of scales and biases is sufficient to convert a pretrained network to perform well on qualitatively different problems (e.g. converting a Single Shot MultiBox Detection (SSD) model into a 1000-class image classification model while reusing 98% of parameters of the SSD feature extractor). Similarly, we show that re-learning existing low-parameter layers (such as depth-wise convolutions) while keeping the rest of the network frozen also improves transfer-learning accuracy significantly. Our approach allows both simultaneous (multi-task) as well as sequential…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Machine Learning and ELM

MethodsConvolution · Non Maximum Suppression · 1x1 Convolution · SSD