Explore the vulnerability of black-box models via diffusion models

Jiacheng Shi; Yanfu Zhang; Huajie Shao; Ashley Gao

arXiv:2506.07590·cs.CV·June 10, 2025

Explore the vulnerability of black-box models via diffusion models

Jiacheng Shi, Yanfu Zhang, Huajie Shao, Ashley Gao

PDF

Open Access

TL;DR

This paper reveals a new security vulnerability where diffusion models can be exploited to generate synthetic data for training substitute models, enabling efficient model extraction and adversarial attacks on black-box models with minimal queries.

Contribution

The study introduces a novel attack method leveraging diffusion models to perform black-box model extraction and adversarial attacks with high success and low query cost.

Findings

01

Achieves 98.68% success rate in adversarial attacks across benchmarks.

02

Improves attack effectiveness by 27.37% over state-of-the-art methods.

03

Uses only 0.01 times the query budget to train effective substitute models.

Abstract

Recent advancements in diffusion models have enabled high-fidelity and photorealistic image generation across diverse applications. However, these models also present security and privacy risks, including copyright violations, sensitive information leakage, and the creation of harmful or offensive content that could be exploited maliciously. In this study, we uncover a novel security threat where an attacker leverages diffusion model APIs to generate synthetic images, which are then used to train a high-performing substitute model. This enables the attacker to execute model extraction and transfer-based adversarial attacks on black-box classification models with minimal queries, without needing access to the original training data. The generated images are sufficiently high-resolution and diverse to train a substitute model whose outputs closely match those of the target model. Across…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis · Digital Media Forensic Detection

MethodsDiffusion