TL;DR
ProARD introduces a dynamic network training method that efficiently produces a wide range of robust student models from a single training process, reducing computational costs and supporting diverse edge device constraints.
Contribution
It proposes a novel dynamic network framework with weight sharing and sampling strategies for efficient adversarial robustness distillation without retraining.
Findings
Supports diverse architectures within one trained model
Reduces computational costs compared to traditional ARD methods
Effectively maintains robustness across different student network configurations
Abstract
Adversarial Robustness Distillation (ARD) has emerged as an effective method to enhance the robustness of lightweight deep neural networks against adversarial attacks. Current ARD approaches have leveraged a large robust teacher network to train one robust lightweight student. However, due to the diverse range of edge devices and resource constraints, current approaches require training a new student network from scratch to meet specific constraints, leading to substantial computational costs and increased CO2 emissions. This paper proposes Progressive Adversarial Robustness Distillation (ProARD), enabling the efficient one-time training of a dynamic network that supports a diverse range of accurate and robust student networks without requiring retraining. We first make a dynamic deep neural network based on dynamic layers by encompassing variations in width, depth, and expansion in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
