TL;DR
This paper introduces BadPatches, a routing-aware backdoor attack on vision MoE models that applies triggers to image patches, achieving high success rates with minimal poisoning and demonstrating the importance of fine-tuning as a defense.
Contribution
The paper presents BadPatches, a novel patch-based backdoor attack tailored for vision MoE models, highlighting its effectiveness and robustness against incomplete model knowledge.
Findings
BadPatches achieves over 80% attack success rate at 0.01% poisoning rate.
Routing-aware triggers outperform routing-agnostic ones in attack success.
Fine-tuning effectively removes backdoors introduced by BadPatches.
Abstract
Mixture of Experts (MoE) architectures have gained popularity for reducing computational costs in deep neural networks by activating only a subset of parameters during inference. While this efficiency makes MoE attractive for vision tasks, the patch-based processing in vision models introduces new methods for adversaries to perform backdoor attacks. In this work, we investigate the vulnerability of vision MoE models for image classification, specifically the patch-based MoE (pMoE) models and MoE-based vision transformers, against backdoor attacks. We propose a novel routing-aware trigger application method BadPatches, which is designed for patch-based processing in vision MoE models. BadPatches applies triggers on image patches rather than on the entire image. We show that BadPatches achieves high attack success rates (ASRs) with lower poisoning rates than routing-agnostic triggers and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
