BadRSSD: Backdoor Attacks on Regularized Self-Supervised Diffusion Models
Jiayao Wang, Yiping Zhang, Mohammad Maruf Hasan, Xiaoying Lei, Jiale Zhang, Junwu Zhu, Qilin Wu, and Dongfang Zhao

TL;DR
This paper introduces BadRSSD, a novel backdoor attack on self-supervised diffusion models' representation layer, enabling stealthy, targeted image generation while maintaining model utility and resisting defenses.
Contribution
We propose the first backdoor attack on the representation layer of self-supervised diffusion models, using PCA space hijacking and distribution constraints for high stealth and precision.
Findings
Outperforms existing attacks in FID and MSE metrics
Establishes reliable backdoors across various architectures
Resists state-of-the-art backdoor defenses
Abstract
Self-supervised diffusion models learn high-quality visual representations via latent space denoising. However, their representation layer poses a distinct threat: unlike traditional attacks targeting generative outputs, its unconstrained latent semantic space allows for stealthy backdoors, permitting malicious control upon triggering. In this paper, we propose BadRSSD, the first backdoor attack targeting the representation layer of self-supervised diffusion models. Specifically, it hijacks the semantic representations of poisoned samples with triggers in Principal Component Analysis (PCA) space toward those of a target image, then controls the denoising trajectory during diffusion by applying coordinated constraints across latent, pixel, and feature distribution spaces to steer the model toward generating the specified target. Additionally, we integrate representation dispersion…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning
