Gray-box Adversarial Training
Vivek B.S., Konda Reddy Mopuri, and R. Venkatesh Babu

TL;DR
This paper introduces Gray-box Adversarial Training, a novel method that improves model robustness against adversarial attacks by using intermediate models to generate more effective adversaries, addressing weaknesses of existing training and evaluation methods.
Contribution
It proposes a new Gray-box Adversarial Training approach and a novel evaluation method, enhancing robustness assessment and training effectiveness against adversarial attacks.
Findings
Models trained with our method show increased robustness.
Our evaluation reveals weaknesses in existing adversarial training.
Gray-box approach outperforms traditional methods in robustness tests.
Abstract
Adversarial samples are perturbed inputs crafted to mislead the machine learning systems. A training mechanism, called adversarial training, which presents adversarial samples along with clean samples has been introduced to learn robust models. In order to scale adversarial training for large datasets, these perturbations can only be crafted using fast and simple methods (e.g., gradient ascent). However, it is shown that adversarial training converges to a degenerate minimum, where the model appears to be robust by generating weaker adversaries. As a result, the models are vulnerable to simple black-box attacks. In this paper we, (i) demonstrate the shortcomings of existing evaluation policy, (ii) introduce novel variants of white-box and black-box attacks, dubbed gray-box adversarial attacks" based on which we propose novel evaluation method to assess the robustness of the learned…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Machine Learning and Data Classification
