MoASE++: Mixture of Activation Sparsity Experts with Domain-Adaptive On-policy Distillation for Continual Test Time Adaptation

Ronyu Zhang; Aosong Cheng; Gaole Dai; Yulin Luo; Jiaming Liu; Li Du; Huanrui Yang; Dan Wang; Leyuan Fang; Yuan Du; Shanghang Zhang

arXiv:2605.17743·cs.CV·May 19, 2026

MoASE++: Mixture of Activation Sparsity Experts with Domain-Adaptive On-policy Distillation for Continual Test Time Adaptation

Ronyu Zhang, Aosong Cheng, Gaole Dai, Yulin Luo, Jiaming Liu, Li Du, Huanrui Yang, Dan Wang, Leyuan Fang, Yuan Du, Shanghang Zhang

PDF

TL;DR

MoASE++ introduces a novel mixture-of-experts framework with domain-adaptive distillation for continual test-time adaptation, effectively handling non-stationary data streams while mitigating catastrophic forgetting.

Contribution

It proposes a new plug-in mixture-of-experts model with activation sparsity and domain-aware routing, combined with adaptive distillation to improve continual adaptation performance.

Findings

01

Achieves state-of-the-art results on CIFAR-10/100-C and ImageNet-C

02

Demonstrates robustness and stability in semantic segmentation tasks

03

Effectively balances plasticity and stability in dynamic environments

Abstract

Continual test-time adaptation adapts a source-pretrained model to non-stationary, unlabeled target streams while retaining past competence, yet texture-biased backbones risk error accumulation and catastrophic forgetting. Drawing inspiration from the process of decoupling shape and texture in the human visual system, we introduce MoASE, a plug-in mixture-of-experts that disentangles domain-agnostic structure from domain-specific texture using Activation Sparsity Experts with Spatial Differentiable Dropout, forming complementary high- and low-activation pathways, while high- and low-rank bottlenecks diversify representations. The Activation Sparsity Gate produces input-adaptive SDD thresholds for precise token selection, and the Domain-Aware Router assigns per-sample expert weights using texture-sensitive cues. To curb confirmation bias on unlabeled streams and stabilize supervision, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.