Decomposing the Neurons: Activation Sparsity via Mixture of Experts for Continual Test Time Adaptation
Rongyu Zhang, Aosong Cheng, Yulin Luo, Gaole Dai, Huanrui Yang,, Jiaming Liu, Ran Xu, Li Du, Yuan Du, Yanbing Jiang, Shanghang Zhang

TL;DR
This paper introduces MoASE, a mixture of experts approach with activation sparsity and specialized gating mechanisms, to improve continual test-time adaptation in vision models, reducing catastrophic forgetting and enhancing performance.
Contribution
The paper proposes a novel MoASE framework with multi-gate structures and a homeostatic loss, advancing CTTA by decomposing neuron activations and adaptively combining experts.
Findings
Achieves state-of-the-art results on four benchmarks.
Effectively reduces catastrophic forgetting during adaptation.
Demonstrates superior performance in both classification and segmentation tasks.
Abstract
Continual Test-Time Adaptation (CTTA), which aims to adapt the pre-trained model to ever-evolving target domains, emerges as an important task for vision models. As current vision models appear to be heavily biased towards texture, continuously adapting the model from one domain distribution to another can result in serious catastrophic forgetting. Drawing inspiration from the human visual system's adeptness at processing both shape and texture according to the famous Trichromatic Theory, we explore the integration of a Mixture-of-Activation-Sparsity-Experts (MoASE) as an adapter for the CTTA task. Given the distinct reaction of neurons with low/high activation to domain-specific/agnostic features, MoASE decomposes the neural activation into high-activation and low-activation components with a non-differentiable Spatial Differentiate Dropout (SDD). Based on the decomposition, we devise…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNeural Networks and Applications · EEG and Brain-Computer Interfaces
MethodsAdapter · Feature Selection · Dropout
