BEARD: Benchmarking the Adversarial Robustness for Dataset Distillation
Zheng Zhou, Wenquan Feng, Shuchang Lyu, Guangliang Cheng, Xiaowei, Huang, Qi Zhao

TL;DR
BEARD is a comprehensive benchmark framework that systematically evaluates the adversarial robustness of dataset distillation methods across various attacks, datasets, and settings, facilitating reproducible research.
Contribution
This paper introduces BEARD, the first unified benchmark for assessing adversarial robustness in dataset distillation, including new metrics and a leaderboard.
Findings
BEARD enables standardized robustness evaluation across DD methods.
Adversarial training improves robustness in dataset distillation.
Benchmark results reveal varying robustness levels among DD techniques.
Abstract
Dataset Distillation (DD) is an emerging technique that compresses large-scale datasets into significantly smaller synthesized datasets while preserving high test performance and enabling the efficient training of large models. However, current research primarily focuses on enhancing evaluation accuracy under limited compression ratios, often overlooking critical security concerns such as adversarial robustness. A key challenge in evaluating this robustness lies in the complex interactions between distillation methods, model architectures, and adversarial attack strategies, which complicate standardized assessments. To address this, we introduce BEARD, an open and unified benchmark designed to systematically assess the adversarial robustness of DD methods, including DM, IDM, and BACON. BEARD encompasses a variety of adversarial attacks (e.g., FGSM, PGD, C&W) on distilled datasets like…
Peer Reviews
Decision·Submitted to ICLR 2026
1. The code, leaderboard, and data pools are open-sourced, which can help facilitate future research. 2. The adversarial game formalism is thoughtfully articulated.
1. The empirical results do not directly benchmark against some newer strategies for adversarial training (e.g, [1]), adversarial attacks (transformation-based attacks [2] and generative approachs [3] )and other widely used datasets (e.g., cinic10, imagenet and mnist) 2. Section 5 reports trends but lacks deeper causal explanations (e.g., why DM improves CREI). 3. Section 3 introduces too many mathematical definitions, but provides limited experimental interpretation or discussion later. 4.
1. The paper presents the first unified benchmark for adversarial robustness in dataset distillation, introducing a novel adversarial game framework and three tailored metrics (RR, AE, CREI). 2. As dataset distillation gains traction in resource-constrained settings, understanding its robustness is critical. BEARD provides a standardized platform for comparative evaluation. 3. The paper is well-structured, with clear descriptions of the framework, metrics, and experimental setup. The public re
1. Completion is slightly insufficient. This paper has systematically expanded and deepened DD-RobustBench through introducing unified evaluation metrics, incorporating more attack types, and proposing a game-theoretic framework. yet it is unable to prove on other larger datasets, more complex architectures and algorithms, and only remains in relatively simple scenarios. 2. In Section 3.10, the CREI metric locks α at 0.5 without explanation. Giving robustness and efficiency equal weight might no
- The paper is clear and well written - The addressed problem is relevant, and I think those benchmarks and their codebase are very valuable for the research community and can serve as a baseline both for attacks and defenses - The authors made a lot of effort to wrap together models, datasets, and attacks, and run a considerable amount of experiments
- I am concerned about the contribution, as it appears weak (particularly considering this venue), both from a technical and novelty point of view. The authors (although I recognize the hard work that has been made) "simply" wrap together existing works, whereas the most novel contribution appears to be the proposed metrics (on which I have some concern, see below). Additionally, there is a non-negligible overlap with the competing DD-RobustBench work, with only incremental improvements over it.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Fault Detection and Control Systems
MethodsLib
