DeepDefense: Layer-Wise Gradient-Feature Alignment for Building Robust Neural Networks
Ci Lin, Tet Yeap, Iluju Kiringa, Biwei Zhang

TL;DR
DeepDefense introduces a layer-wise gradient-feature alignment method that enhances neural network robustness against adversarial attacks by smoothing the loss landscape and aligning gradients with internal features.
Contribution
It proposes a novel Gradient-Feature Alignment regularization across layers, improving robustness without architecture constraints.
Findings
Outperforms standard adversarial training on CIFAR-10 by up to 15.2% under APGD attacks.
Requires 20-30 times higher perturbations to fool models against DeepFool and EADEN attacks.
Achieves a flatter loss landscape and stronger decision boundaries.
Abstract
Deep neural networks are known to be vulnerable to adversarial perturbations, which are small and carefully crafted inputs that lead to incorrect predictions. In this paper, we propose DeepDefense, a novel defense framework that applies Gradient-Feature Alignment (GFA) regularization across multiple layers to suppress adversarial vulnerability. By aligning input gradients with internal feature representations, DeepDefense promotes a smoother loss landscape in tangential directions, thereby reducing the model's sensitivity to adversarial noise. We provide theoretical insights into how adversarial perturbation can be decomposed into radial and tangential components and demonstrate that alignment suppresses loss variation in tangential directions, where most attacks are effective. Empirically, our method achieves significant improvements in robustness across both gradient-based and…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
I honestly don't see anything positive in this paper. I really do.
In the related work section (lines 136-137), the authors argue that the GNR has little effect on enhancing the robustness of neural networks. I am curious about the source of this claim. Are there any experiments that support this claim? Also, in the related work section (lines 150-151), the authors argue that the GradAlign method, for instance, requires a robust neural network or a pre-trained teacher model. This claim is factually wrong. This statement is definitively false. The GradAlign
The paper introduces an intuitive mechanism that connects gradient alignment and feature representation to local loss smoothness, providing geometric intuition for robustness. The layer-wise GFA regularization is easy to integrate and lightweight, requiring only minor changes to standard training pipelines.
#### **Novelty concerns** A central issue with this paper is its lack of novelty compared to prior gradient alignment defenses. In particular, the core principle of encouraging gradient alignment as a regularizer for adversarial training has already been extensively studied, most notably in *Understanding and Improving Fast Adversarial Training* (Andriushchenko & Flammarion, NeurIPS 2020), where the authors propose GradAlign, a method that maximizes the gradient alignment between inputs and thei
1. Simple and intuitive idea: The proposed Gradient-Feature Alignment regularization is conceptually straightforward and easy to implement. 2. Effective on small models and datasets: The experiments show that DeepDefense can enhance robustness for lightweight networks on small datasets.
1. Limited experimental scope: - The evaluation is restricted to outdated and small datasets (CIFAR-10, Fashion-MNIST). For a robustness paper, larger and more challenging datasets (e.g., CIFAR-100, Tiny-ImageNet) are necessary. - The network architectures (CNN, MLP) are too simple. The study should include modern architectures such as ResNet or WideResNet to demonstrate scalability. - The attack methods are mostly from 4–5 years ago. The paper lists the components of AutoAttack (AP
- Simple, model-agnostic regularizer: GFA is easy to plug in (cosine penalty), with clear training recipe and minimal overhead. - Compelling geometric intuition + diagnostics: Radial/tangential view is clear, supported by gradient/feature visualizations and layer-wise analysis. - Thoughtful ablations & broad attack menu: FIRST vs DEEP comparisons, ε-sweeps, and many attack families provide a reasonably thorough study.
- Non-standard evaluation choices: Tests are run only on samples all models classify correctly, and AutoAttack components are reported separately instead of the canonical combined AA, both can inflate robustness and hinder comparability. - Attack configs under-specified / possibly weak: Extremely high accuracies under FAB (and others) suggest misconfiguration; key details like ε, steps, and restarts aren’t consistently specified. - Missing clean-accuracy trade-offs - Narrow scope of models/da
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Ethics and Social Impacts of AI · Advanced Neural Network Applications
