Generating Synthetic Fair Syntax-agnostic Data by Learning and   Distilling Fair Representation

Md Fahim Sikder; Resmi Ramachandranpillai; Daniel de Leng; Fredrik; Heintz

arXiv:2408.10755·cs.LG·August 21, 2024

Generating Synthetic Fair Syntax-agnostic Data by Learning and Distilling Fair Representation

Md Fahim Sikder, Resmi Ramachandranpillai, Daniel de Leng, Fredrik, Heintz

PDF

Open Access

TL;DR

This paper introduces a novel fair data generation method using knowledge distillation to create syntax-agnostic, fair synthetic data with improved fairness, quality, and utility over existing models, while reducing computational complexity.

Contribution

It proposes a flexible, stable fair generative approach based on latent space distillation that enhances fairness and data utility with lower computational demands.

Findings

01

Achieved 5% improvement in fairness

02

Achieved 5% improvement in synthetic sample quality

03

Achieved 10% improvement in data utility

Abstract

Data Fairness is a crucial topic due to the recent wide usage of AI powered applications. Most of the real-world data is filled with human or machine biases and when those data are being used to train AI models, there is a chance that the model will reflect the bias in the training data. Existing bias-mitigating generative methods based on GANs, Diffusion models need in-processing fairness objectives and fail to consider computational overhead while choosing computationally-heavy architectures, which may lead to high computational demands, instability and poor optimization performance. To mitigate this issue, in this work, we present a fair data generation technique based on knowledge distillation, where we use a small architecture to distill the fair representation in the latent space. The idea of fair latent space distillation enables more flexible and stable training of Fair…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection

MethodsDiffusion