PRISM: Exposing and Resolving Spurious Isolation in Federated Multimodal Continual Learning
Beining Wu, Zihao Ding, Jun Huang

TL;DR
PRISM introduces a gradient subspace basis approach to improve federated multimodal continual learning, addressing issues of task interference and routing assumptions, resulting in significant accuracy gains.
Contribution
It proposes a novel gradient subspace basis method that maintains orthogonality under federated averaging, improving task separation in MoE-based federated learning.
Findings
PRISM outperforms 16 state-of-the-art baselines in accuracy.
Performance margin increases from +3.23 pp to +6.06 pp over baselines.
Effective across multiple large-scale multimodal datasets.
Abstract
While current federated multimodal continual learning over mixture-of-experts low-rank adaptation (MoE-LoRA) is built on the unverified assumption that routing isolates task-specific knowledge into disjoint experts, we argue that routing operates per-sample, while forgetting accumulates across the task sequence, and gradient conflict persists within each expert even when routing is maximally polarized. Moreover, activation-subspace protection can also fail because, under parameter-efficient fine-tuning, it entangles tasks due to a dimension-counting bound, and federated averaging (FedAvg) disrupts client-side orthogonality. To address this, we propose PRISM (Per-expert Routing-projection Interference-informed Subspace Method), which maintains a per-expert gradient subspace basis whose orthogonality is preserved under FedAvg and reinterprets MoE routing as a capacity allocator. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
