MoETTA: Test-Time Adaptation Under Mixed Distribution Shifts with MoE-LayerNorm

Xiao Fan; Jingyan Jiang; Zhaoru Chen; Fanding Huang; Xiao Chen; Qinting Jiang; Bowen Zhang; Xing Tang; and Zhi Wang

arXiv:2511.13760·cs.LG·November 19, 2025

MoETTA: Test-Time Adaptation Under Mixed Distribution Shifts with MoE-LayerNorm

Xiao Fan, Jingyan Jiang, Zhaoru Chen, Fanding Huang, Xiao Chen, Qinting Jiang, Bowen Zhang, Xing Tang, and Zhi Wang

PDF

Open Access 1 Video

TL;DR

MoETTA introduces a novel entropy-based test-time adaptation framework using Mixture-of-Experts to handle complex mixed distribution shifts, outperforming existing methods on new realistic benchmarks.

Contribution

The paper proposes MoETTA, a Mixture-of-Experts based TTA method that enables diverse adaptation directions, addressing limitations of unified adaptation paths under heterogeneous shifts.

Findings

01

MoETTA achieves state-of-the-art performance on new benchmarks.

02

Modeling multiple adaptation directions improves robustness.

03

Experiments demonstrate consistent outperformance over baselines.

Abstract

Test-Time adaptation (TTA) has proven effective in mitigating performance drops under single-domain distribution shifts by updating model parameters during inference. However, real-world deployments often involve mixed distribution shifts, where test samples are affected by diverse and potentially conflicting domain factors, posing significant challenges even for SOTA TTA methods. A key limitation in existing approaches is their reliance on a unified adaptation path, which fails to account for the fact that optimal gradient directions can vary significantly across different domains. Moreover, current benchmarks focus only on synthetic or homogeneous shifts, failing to capture the complexity of real-world heterogeneous mixed distribution shifts. To address this, we propose MoETTA, a novel entropy-based TTA framework that integrates the Mixture-of-Experts (MoE) architecture. Rather than…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

MoETTA: Test-Time Adaptation Under Mixed Distribution Shifts with MoE-LayerNorm· underline

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning · Software System Performance and Reliability