TABOR: A Highly Accurate Approach to Inspecting and Restoring Trojan Backdoors in AI Systems
Wenbo Guo, Lun Wang, Xinyu Xing, Min Du, Dawn Song

TL;DR
TABOR introduces a novel optimization-based method guided by explainable AI to accurately detect and restore trojan backdoors in neural networks, overcoming limitations of previous techniques that relied on unrealistic assumptions.
Contribution
The paper proposes TABOR, a new trojan detection approach that formulates detection as a non-convex optimization problem with a novel objective function and a new quality metric.
Findings
TABOR outperforms existing methods in detecting trojan backdoors.
It effectively restores high-fidelity trojan trigger images.
The new metric improves detection accuracy and reduces false alarms.
Abstract
A trojan backdoor is a hidden pattern typically implanted in a deep neural network. It could be activated and thus forces that infected model behaving abnormally only when an input data sample with a particular trigger present is fed to that model. As such, given a deep neural network model and clean input samples, it is very challenging to inspect and determine the existence of a trojan backdoor. Recently, researchers design and develop several pioneering solutions to address this acute problem. They demonstrate the proposed techniques have a great potential in trojan detection. However, we show that none of these existing techniques completely address the problem. On the one hand, they mostly work under an unrealistic assumption (e.g. assuming availability of the contaminated training database). On the other hand, the proposed techniques cannot accurately detect the existence of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Integrated Circuits and Semiconductor Failure Analysis · Anomaly Detection Techniques and Applications
