Democratic Training Against Universal Adversarial Perturbations
Bing Sun, Jun Sun, Wei Zhao

TL;DR
This paper introduces Democratic Training, a novel defense method against universal adversarial perturbations that enhances neural network robustness by entropy-based model improvement, effectively reducing attack success and maintaining accuracy.
Contribution
It proposes Democratic Training, an entropy-based method that mitigates universal adversarial perturbations, improving robustness without sacrificing clean data accuracy.
Findings
Reduces attack success rate across multiple models and datasets.
Enhances model robustness against various universal adversarial attacks.
Maintains high accuracy on clean samples while defending against UAPs.
Abstract
Despite their advances and success, real-world deep neural networks are known to be vulnerable to adversarial attacks. Universal adversarial perturbation, an input-agnostic attack, poses a serious threat for them to be deployed in security-sensitive systems. In this case, a single universal adversarial perturbation deceives the model on a range of clean inputs without requiring input-specific optimization, which makes it particularly threatening. In this work, we observe that universal adversarial perturbations usually lead to abnormal entropy spectrum in hidden layers, which suggests that the prediction is dominated by a small number of ``feature'' in such cases (rather than democratically by many features). Inspired by this, we propose an efficient yet effective defense method for mitigating UAPs called \emph{Democratic Training} by performing entropy-based model enhancement to…
Peer Reviews
Decision·ICLR 2025 Poster
1. This paper is well-written and easy-to-follow. 2. The paper makes a commendable observation concerning the entropy spectrum in deep neural network layers, which is a significant contribution to the field and forms the basis for the proposed defense mechanism. 3. The efficiency of the proposed democratic training method is noteworthy. It circumvents the need to generate UAPs during training, instead utilizing a limited number of epochs to identify low-entropy examples, which is a resourceful a
1. The threat model employed in the experiments primarily utilizes gradient-based attack methods. These methods presuppose access to the model's parameters, aligning with white-box attack scenarios. This appears to be at odds with the assertion in Section 2.3 that adversarial knowledge does not extend to the internal parameters of the model. Clarification on this point would be beneficial. 2. The comparison with adversarial training methods may require further refinement. Adversarial training a
1. The use of entropy to reveal the dominance of UAPs and the concept of Democratic Training as a defense mechanism is innovative. 2. The method was evaluated across various neural network architectures and benchmark datasets, which strengthens the claim of its general applicability. 3. Unlike other defense methods, Democratic Training does not require architectural modifications, which makes it easy to integrate into existing systems
1. The evaluation focused primarily on benchmark datasets and common UAP generation methods. It would be beneficial to see how this approach performs on more sophisticated and adaptive attacks, such as adversarial examples generated in dynamic environments. 2. The proposed method mainly works well on CNN. Authors should validate it in more types of networks, such as transformers. 3. The method requires access to a small set of clean data for entropy measurement and training, which might not alw
1. The experiments are comprehensive. 2. The proposed defense is attack-agnostic which is more practical and efficient. 3. The proposed defense largely reduced the targeted attack success rate. I tend to accept this pape. However, since I'm not familiar with UAP attack and defense baseline methods, I will listen to other reviewers and public comments and then decide.
1. UAP attacks evaluated in the paper were published in 2018,2019,2020 and seem out-of-date. 2. After democracy training, there is still a gap between ``AAcc.'' and clean accuracy. I wonder about the effectiveness of democracy training against non-targeted UAPs. 3. Average results in Table 4\&5 are ambiguous since there can be a large bias among different networks.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning
