Alleviating Representational Shift for Continual Fine-tuning

Shibo Jie; Zhi-Hong Deng; Ziheng Li

arXiv:2204.10535·cs.CV·May 10, 2022

Alleviating Representational Shift for Continual Fine-tuning

Shibo Jie, Zhi-Hong Deng, Ziheng Li

PDF

Open Access 1 Repo

TL;DR

This paper introduces ConFiT, a novel fine-tuning approach that mitigates representational shift in continual learning by using cross-convolution batch normalization and hierarchical fine-tuning, leading to improved performance.

Contribution

The paper proposes ConFiT, a new fine-tuning method that addresses both feature and intermediate layer shifts, enhancing continual learning effectiveness.

Findings

01

Outperforms state-of-the-art methods on four datasets.

02

Reduces catastrophic forgetting with lower storage overhead.

03

Effectively maintains feature representations during fine-tuning.

Abstract

We study a practical setting of continual learning: fine-tuning on a pre-trained model continually. Previous work has found that, when training on new tasks, the features (penultimate layer representations) of previous data will change, called representational shift. Besides the shift of features, we reveal that the intermediate layers' representational shift (IRS) also matters since it disrupts batch normalization, which is another crucial cause of catastrophic forgetting. Motivated by this, we propose ConFiT, a fine-tuning method incorporating two components, cross-convolution batch normalization (Xconv BN) and hierarchical fine-tuning. Xconv BN maintains pre-convolution running means instead of post-convolution, and recovers post-convolution ones before testing, which corrects the inaccurate estimates of means under IRS. Hierarchical fine-tuning leverages a multi-stage strategy to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jieshibo/confit
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications

MethodsBatch Normalization