Improving Black-Box Generative Attacks via Generator Semantic Consistency

Jongoh Jeong; Hunmin Yang; Jaeseok Jeong; Kuk-Jin Yoon

arXiv:2506.18248·cs.CV·March 16, 2026

Improving Black-Box Generative Attacks via Generator Semantic Consistency

Jongoh Jeong, Hunmin Yang, Jaeseok Jeong, Kuk-Jin Yoon

PDF

Open Access 3 Reviews

TL;DR

This paper introduces a method to improve black-box generative adversarial attacks by enforcing semantic consistency within the generator, leading to more effective transferability without increasing inference time.

Contribution

It proposes a novel semantic consistency enforcement using an EMA teacher to stabilize generator features, enhancing transferability in black-box attacks while maintaining efficiency.

Findings

01

Reduced semantic drift in generator representations.

02

Improved attack transferability across models and tasks.

03

Enhanced evaluation with the new ACR metric.

Abstract

Transfer attacks optimize on a surrogate and deploy to a black-box target. While iterative optimization attacks in this paradigm are limited by their per-input cost limits efficiency and scalability due to multistep gradient updates for each input, generative attacks alleviate these by producing adversarial examples in a single forward pass at test time. However, current generative attacks still adhere to optimizing surrogate losses (e.g., feature divergence) and overlook the generator's internal dynamics, underexploring how the generator's internal representations shape transferable perturbations. To address this, we enforce semantic consistency by aligning the early generator's intermediate features to an EMA teacher, stabilizing object-aligned representations and improving black-box transfer without inference-time overhead. To ground the mechanism, we quantify semantic stability as…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 2Confidence 4

Strengths

1. The paper is well-written and easy to follow. 2. The proposed method is simple and intuitive. 3. The authors provide sufficient experiments across different models and data domains to prove the effectiveness of their method. 4. Experiments about the intermediate block-level analysis are interesting and provide evidence of preserving image contours and shapes.

Weaknesses

1. The designed method is simple and intuitive, which somewhat lacks novelty as feature-level guidance has been widely investigated in various existing studies. 2. As shown in Table 2, the proposed method only brings marginal improvements for powerful attacks such as PDCL, raising concerns about its necessity and effectiveness. 3. Lack experiments against input-processing-based defense methods. 4. In line 69, a missing full stop after "generator intermediate blocks".

Reviewer 02Rating 6Confidence 3

Strengths

1. The self-feature consistency loss is well-motivated and mathematically specified. 2. The experiments are comprehensive, including multiple model architectures, cross-domain and cross-task scenarios and fine-grained ablation. 3. The finding of the semantic drift across generator layers that degrades black-box transferability is novel.

Weaknesses

1. The baseline methods should be introduced before presenting the experimental results, as omitting this order significantly reduces readability. 2. Although the method claims to be architecture-agnostic, there is room for a stronger demonstration across more varied generator or victim types (e.g., diffusion)

Reviewer 03Rating 6Confidence 4

Strengths

1. Simple, orthogonal mechanism: The EMA teacher + early-block consistency integrates into several strong generative baselines without test-time overhead; the approach is easy to adopt. 2. Clear, generator-internal motivation: The diagnostic showing that early generator features retain object contours while later ones blur them is compelling and grounded in measurable variability (foreground-IoU std across blocks). 3. Broad evaluation: Cross-model results span CNN, ViT, Mixer, and SSM/Mamba fami

Weaknesses

1. Scope of gains & negative deltas. While averages improve, several cells in Table 2 are near-zero or negative (notably for certain transformer targets and PDCL). 2. Frequency analysis lacks operational detail. The spectral-energy analysis is interesting but currently underspecified. Precisely define the transform (e.g., 2-D FFT with magnitude spectrum), the radial banding scheme (cutoffs in normalized frequency), and whether energies are computed on perturbations or activations. Provide explic

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Malware Detection Techniques · Topic Modeling · Software Engineering Research