TL;DR
Side-tuning introduces a lightweight additive approach for adapting pre-trained neural networks, outperforming traditional methods like fine-tuning by reducing overfitting and catastrophic forgetting across diverse tasks.
Contribution
The paper proposes side-tuning, a simple additive method for network adaptation that is more effective and robust than existing approaches like fine-tuning and feature extraction.
Findings
Performs well across multiple tasks including vision, NLP, and reinforcement learning.
Reduces overfitting and catastrophic forgetting compared to traditional methods.
Demonstrates consistent improvements in diverse adaptation scenarios.
Abstract
When training a neural network for a desired task, one may prefer to adapt a pre-trained network rather than starting from randomly initialized weights. Adaptation can be useful in cases when training data is scarce, when a single learner needs to perform multiple tasks, or when one wishes to encode priors in the network. The most commonly employed approaches for network adaptation are fine-tuning and using the pre-trained network as a fixed feature extractor, among others. In this paper, we propose a straightforward alternative: side-tuning. Side-tuning adapts a pre-trained network by training a lightweight "side" network that is fused with the (unchanged) pre-trained network via summation. This simple method works as well as or better than existing solutions and it resolves some of the basic issues with fine-tuning, fixed features, and other common approaches. In particular,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
