FairDD: Fair Dataset Distillation
Qihang Zhou, Shenhao Fang, Shibo He, Wenchao Meng, Jiming Chen

TL;DR
FairDD introduces a novel dataset distillation method that enhances fairness towards protected attributes like gender and race in synthetic datasets, addressing bias issues in traditional methods without sacrificing accuracy.
Contribution
The paper proposes FairDD, a fair dataset distillation framework that synchronizes synthetic data with protected attribute groups, improving fairness across diverse distillation approaches.
Findings
FairDD significantly improves fairness over vanilla dataset distillation.
FairDD maintains high accuracy while enhancing fairness.
The method is versatile across different distillation techniques.
Abstract
Condensing large datasets into smaller synthetic counterparts has demonstrated its promise for image classification. However, previous research has overlooked a crucial concern in image recognition: ensuring that models trained on condensed datasets are unbiased towards protected attributes (PA), such as gender and race. Our investigation reveals that dataset distillation fails to alleviate the unfairness towards minority groups within original datasets. Moreover, this bias typically worsens in the condensed datasets due to their smaller size. To bridge the research gap, we propose a novel fair dataset distillation (FDD) framework, namely FairDD, which can be seamlessly applied to diverse matching-based DD approaches (DDs), requiring no modifications to their original architectures. The key innovation of FairDD lies in synchronously matching synthetic datasets to PA-wise groups of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Machine Learning and Data Classification
