Exploiting Explanations for Model Inversion Attacks
Xuejun Zhao, Wencan Zhang, Xiaokui Xiao, Brian Y. Lim

TL;DR
This paper demonstrates that explanations in AI models can be exploited to reconstruct private images, revealing significant privacy risks and the need for new privacy-preserving techniques in explainable AI.
Contribution
The study introduces novel multi-modal transposed CNN architectures for model inversion using explanations, and analyzes how different explanation types impact privacy risks.
Findings
Explanations significantly improve image reconstruction performance.
Surrogate explanations can be exploited even for non-explainable models.
Privacy risks increase with certain explanation types and factors.
Abstract
The successful deployment of artificial intelligence (AI) in many domains from healthcare to hiring requires their responsible use, particularly in model explanations and privacy. Explainable artificial intelligence (XAI) provides more information to help users to understand model decisions, yet this additional knowledge exposes additional risks for privacy attacks. Hence, providing explanation harms privacy. We study this risk for image-based model inversion attacks and identified several attack architectures with increasing performance to reconstruct private image data from model explanations. We have developed several multi-modal transposed CNN architectures that achieve significantly higher inversion performance than using the target model prediction only. These XAI-aware inversion models were designed to exploit the spatial knowledge in image explanations. To understand which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
