Generating Synthetic Fair Syntax-agnostic Data by Learning and Distilling Fair Representation
Md Fahim Sikder, Resmi Ramachandranpillai, Daniel de Leng, Fredrik, Heintz

TL;DR
This paper introduces a novel fair data generation method using knowledge distillation to create syntax-agnostic, fair synthetic data with improved fairness, quality, and utility over existing models, while reducing computational complexity.
Contribution
It proposes a flexible, stable fair generative approach based on latent space distillation that enhances fairness and data utility with lower computational demands.
Findings
Achieved 5% improvement in fairness
Achieved 5% improvement in synthetic sample quality
Achieved 10% improvement in data utility
Abstract
Data Fairness is a crucial topic due to the recent wide usage of AI powered applications. Most of the real-world data is filled with human or machine biases and when those data are being used to train AI models, there is a chance that the model will reflect the bias in the training data. Existing bias-mitigating generative methods based on GANs, Diffusion models need in-processing fairness objectives and fail to consider computational overhead while choosing computationally-heavy architectures, which may lead to high computational demands, instability and poor optimization performance. To mitigate this issue, in this work, we present a fair data generation technique based on knowledge distillation, where we use a small architecture to distill the fair representation in the latent space. The idea of fair latent space distillation enables more flexible and stable training of Fair…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection
MethodsDiffusion
