Decomposing the Neurons: Activation Sparsity via Mixture of Experts for   Continual Test Time Adaptation

Rongyu Zhang; Aosong Cheng; Yulin Luo; Gaole Dai; Huanrui Yang,; Jiaming Liu; Ran Xu; Li Du; Yuan Du; Yanbing Jiang; Shanghang Zhang

arXiv:2405.16486·cs.CV·May 28, 2024

Decomposing the Neurons: Activation Sparsity via Mixture of Experts for Continual Test Time Adaptation

Rongyu Zhang, Aosong Cheng, Yulin Luo, Gaole Dai, Huanrui Yang,, Jiaming Liu, Ran Xu, Li Du, Yuan Du, Yanbing Jiang, Shanghang Zhang

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces MoASE, a mixture of experts approach with activation sparsity and specialized gating mechanisms, to improve continual test-time adaptation in vision models, reducing catastrophic forgetting and enhancing performance.

Contribution

The paper proposes a novel MoASE framework with multi-gate structures and a homeostatic loss, advancing CTTA by decomposing neuron activations and adaptively combining experts.

Findings

01

Achieves state-of-the-art results on four benchmarks.

02

Effectively reduces catastrophic forgetting during adaptation.

03

Demonstrates superior performance in both classification and segmentation tasks.

Abstract

Continual Test-Time Adaptation (CTTA), which aims to adapt the pre-trained model to ever-evolving target domains, emerges as an important task for vision models. As current vision models appear to be heavily biased towards texture, continuously adapting the model from one domain distribution to another can result in serious catastrophic forgetting. Drawing inspiration from the human visual system's adeptness at processing both shape and texture according to the famous Trichromatic Theory, we explore the integration of a Mixture-of-Activation-Sparsity-Experts (MoASE) as an adapter for the CTTA task. Given the distinct reaction of neurons with low/high activation to domain-specific/agnostic features, MoASE decomposes the neural activation into high-activation and low-activation components with a non-differentiable Spatial Differentiate Dropout (SDD). Based on the decomposition, we devise…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

royzry98/moase-pytorch
pytorchOfficial

Videos

Decomposing the Neurons: Activation Sparsity via Mixture of Experts for Continual Test Time Adaptation· underline

Taxonomy

TopicsNeural Networks and Applications · EEG and Brain-Computer Interfaces

MethodsAdapter · Feature Selection · Dropout