Proactive Disentangled Modeling of Trigger-Object Pairings for Backdoor Defense
Kyle Stein, Andrew A. Mahyari, Guillermo Francia III, Eman El-Sheikh

TL;DR
This paper presents DBOM, a proactive method using structured disentanglement with Vision-Language Models to detect and neutralize both seen and unseen backdoor triggers in neural networks, improving security.
Contribution
Introduces DBOM, a novel disentangled modeling framework leveraging VLMs for proactive backdoor detection, capable of zero-shot generalization to unseen trigger-object pairs.
Findings
Robust detection of poisoned images on CIFAR-10 and GTSRB.
Significant improvement over existing backdoor defense methods.
Effective zero-shot generalization to unseen triggers.
Abstract
Deep neural networks (DNNs) and generative AI (GenAI) are increasingly vulnerable to backdoor attacks, where adversaries embed triggers into inputs to cause models to misclassify or misinterpret target labels. Beyond traditional single-trigger scenarios, attackers may inject multiple triggers across various object classes, forming unseen backdoor-object configurations that evade standard detection pipelines. In this paper, we introduce DBOM (Disentangled Backdoor-Object Modeling), a proactive framework that leverages structured disentanglement to identify and neutralize both seen and unseen backdoor threats at the dataset level. Specifically, DBOM factorizes input image representations by modeling triggers and objects as independent primitives in the embedding space through the use of Vision-Language Models (VLMs). By leveraging the frozen, pre-trained encoders of VLMs, our approach…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
