BadRSSD: Backdoor Attacks on Regularized Self-Supervised Diffusion Models

Jiayao Wang; Yiping Zhang; Mohammad Maruf Hasan; Xiaoying Lei; Jiale Zhang; Junwu Zhu; Qilin Wu; and Dongfang Zhao

arXiv:2603.01019·cs.CR·March 3, 2026

BadRSSD: Backdoor Attacks on Regularized Self-Supervised Diffusion Models

Jiayao Wang, Yiping Zhang, Mohammad Maruf Hasan, Xiaoying Lei, Jiale Zhang, Junwu Zhu, Qilin Wu, and Dongfang Zhao

PDF

Open Access

TL;DR

This paper introduces BadRSSD, a novel backdoor attack on self-supervised diffusion models' representation layer, enabling stealthy, targeted image generation while maintaining model utility and resisting defenses.

Contribution

We propose the first backdoor attack on the representation layer of self-supervised diffusion models, using PCA space hijacking and distribution constraints for high stealth and precision.

Findings

01

Outperforms existing attacks in FID and MSE metrics

02

Establishes reliable backdoors across various architectures

03

Resists state-of-the-art backdoor defenses

Abstract

Self-supervised diffusion models learn high-quality visual representations via latent space denoising. However, their representation layer poses a distinct threat: unlike traditional attacks targeting generative outputs, its unconstrained latent semantic space allows for stealthy backdoors, permitting malicious control upon triggering. In this paper, we propose BadRSSD, the first backdoor attack targeting the representation layer of self-supervised diffusion models. Specifically, it hijacks the semantic representations of poisoned samples with triggers in Principal Component Analysis (PCA) space toward those of a target image, then controls the denoising trajectory during diffusion by applying coordinated constraints across latent, pixel, and feature distribution spaces to steer the model toward generating the specified target. Additionally, we integrate representation dispersion…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning