FairDD: Fair Dataset Distillation

Qihang Zhou; Shenhao Fang; Shibo He; Wenchao Meng; Jiming Chen

arXiv:2411.19623·cs.CV·October 14, 2025

FairDD: Fair Dataset Distillation

Qihang Zhou, Shenhao Fang, Shibo He, Wenchao Meng, Jiming Chen

PDF

Open Access

TL;DR

FairDD introduces a novel dataset distillation method that enhances fairness towards protected attributes like gender and race in synthetic datasets, addressing bias issues in traditional methods without sacrificing accuracy.

Contribution

The paper proposes FairDD, a fair dataset distillation framework that synchronizes synthetic data with protected attribute groups, improving fairness across diverse distillation approaches.

Findings

01

FairDD significantly improves fairness over vanilla dataset distillation.

02

FairDD maintains high accuracy while enhancing fairness.

03

The method is versatile across different distillation techniques.

Abstract

Condensing large datasets into smaller synthetic counterparts has demonstrated its promise for image classification. However, previous research has overlooked a crucial concern in image recognition: ensuring that models trained on condensed datasets are unbiased towards protected attributes (PA), such as gender and race. Our investigation reveals that dataset distillation fails to alleviate the unfairness towards minority groups within original datasets. Moreover, this bias typically worsens in the condensed datasets due to their smaller size. To bridge the research gap, we propose a novel fair dataset distillation (FDD) framework, namely FairDD, which can be seamlessly applied to diverse matching-based DD approaches (DDs), requiring no modifications to their original architectures. The key innovation of FairDD lies in synchronously matching synthetic datasets to PA-wise groups of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Stream Mining Techniques · Machine Learning and Data Classification