DiffInject: Revisiting Debias via Synthetic Data Generation using   Diffusion-based Style Injection

Donggeun Ko; Sangwoo Jo; Dongjun Lee; Namjun Park; Jaekwang Kim

arXiv:2406.06134·cs.CV·June 11, 2024

DiffInject: Revisiting Debias via Synthetic Data Generation using Diffusion-based Style Injection

Donggeun Ko, Sangwoo Jo, Dongjun Lee, Namjun Park, Jaekwang Kim

PDF

Open Access

TL;DR

DiffInject leverages diffusion models to generate synthetic bias-conflict samples, significantly reducing dataset bias in a fully unsupervised manner, without requiring explicit bias labels or types.

Contribution

It introduces a novel diffusion-based data augmentation method for debiasing that operates without explicit bias labels, advancing the use of generative models in bias mitigation.

Findings

01

Effective reduction of dataset bias demonstrated

02

Operates without explicit bias labels

03

Utilizes latent space manipulation in diffusion models

Abstract

Dataset bias is a significant challenge in machine learning, where specific attributes, such as texture or color of the images are unintentionally learned resulting in detrimental performance. To address this, previous efforts have focused on debiasing models either by developing novel debiasing algorithms or by generating synthetic data to mitigate the prevalent dataset biases. However, generative approaches to date have largely relied on using bias-specific samples from the dataset, which are typically too scarce. In this work, we propose, DiffInject, a straightforward yet powerful method to augment synthetic bias-conflict samples using a pretrained diffusion model. This approach significantly advances the use of diffusion models for debiasing purposes by manipulating the latent space. Our framework does not require any explicit knowledge of the bias types or labelling, making it a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing

MethodsDiffusion