DISTIL: Data-Free Inversion of Suspicious Trojan Inputs via Latent Diffusion

Hossein Mirzaei; Zeinab Taghavi; Sepehr Rezaee; Masoud Hadi; Moein Madadi; Mackenzie W. Mathis

arXiv:2507.22813·cs.CV·July 31, 2025

DISTIL: Data-Free Inversion of Suspicious Trojan Inputs via Latent Diffusion

Hossein Mirzaei, Zeinab Taghavi, Sepehr Rezaee, Masoud Hadi, Moein Madadi, Mackenzie W. Mathis

PDF

TL;DR

DISTIL introduces a data-free, diffusion-based method for accurately reconstructing Trojan triggers in neural networks, enhancing backdoor detection without requiring training data or strong trigger assumptions.

Contribution

It proposes a novel zero-shot, diffusion-guided trigger inversion technique that outperforms existing methods in identifying malicious triggers in neural networks.

Findings

01

Achieves up to 7.1% higher accuracy on BackdoorBench dataset.

02

Improves trojaned object detection model scanning by 9.4%.

03

Effectively distinguishes clean versus Trojaned models.

Abstract

Deep neural networks have demonstrated remarkable success across numerous tasks, yet they remain vulnerable to Trojan (backdoor) attacks, raising serious concerns about their safety in real-world mission-critical applications. A common countermeasure is trigger inversion -- reconstructing malicious "shortcut" patterns (triggers) inserted by an adversary during training. Current trigger-inversion methods typically search the full pixel space under specific assumptions but offer no assurances that the estimated trigger is more than an adversarial perturbation that flips the model output. Here, we propose a data-free, zero-shot trigger-inversion strategy that restricts the search space while avoiding strong assumptions on trigger appearance. Specifically, we incorporate a diffusion-based generator guided by the target classifier; through iterative generation, we produce candidate triggers…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.