Backdoor Attacks Against Dataset Distillation
Yugeng Liu, Zheng Li, Michael Backes, Yun Shen, Yang Zhang

TL;DR
This paper introduces novel backdoor attack methods against dataset distillation, demonstrating their effectiveness and ability to bypass defenses in image models trained on synthetic data.
Contribution
It presents the first backdoor attacks targeting models trained on distilled datasets, with two methods that inject triggers during the distillation process.
Findings
DOORPING achieves near-perfect attack success rates
Attacks bypass multiple defense mechanisms
Effective across various datasets and architectures
Abstract
Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · COVID-19 diagnosis using AI · Machine Learning in Healthcare
