Side-Tuning: A Baseline for Network Adaptation via Additive Side   Networks

Jeffrey O Zhang; Alexander Sax; Amir Zamir; Leonidas Guibas; Jitendra; Malik

arXiv:1912.13503·cs.LG·August 3, 2020

Side-Tuning: A Baseline for Network Adaptation via Additive Side Networks

Jeffrey O Zhang, Alexander Sax, Amir Zamir, Leonidas Guibas, Jitendra, Malik

PDF

2 Repos

TL;DR

Side-tuning introduces a lightweight additive approach for adapting pre-trained neural networks, outperforming traditional methods like fine-tuning by reducing overfitting and catastrophic forgetting across diverse tasks.

Contribution

The paper proposes side-tuning, a simple additive method for network adaptation that is more effective and robust than existing approaches like fine-tuning and feature extraction.

Findings

01

Performs well across multiple tasks including vision, NLP, and reinforcement learning.

02

Reduces overfitting and catastrophic forgetting compared to traditional methods.

03

Demonstrates consistent improvements in diverse adaptation scenarios.

Abstract

When training a neural network for a desired task, one may prefer to adapt a pre-trained network rather than starting from randomly initialized weights. Adaptation can be useful in cases when training data is scarce, when a single learner needs to perform multiple tasks, or when one wishes to encode priors in the network. The most commonly employed approaches for network adaptation are fine-tuning and using the pre-trained network as a fixed feature extractor, among others. In this paper, we propose a straightforward alternative: side-tuning. Side-tuning adapts a pre-trained network by training a lightweight "side" network that is fused with the (unchanged) pre-trained network via summation. This simple method works as well as or better than existing solutions and it resolves some of the basic issues with fine-tuning, fixed features, and other common approaches. In particular,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.