Right for the Right Latent Factors: Debiasing Generative Models via Disentanglement
Xiaoting Shao, Karl Stelzner, Kristian Kersting

TL;DR
This paper introduces a method to debias generative models by disentangling their internal representations through human feedback, effectively reducing bias and improving disentanglement quality.
Contribution
It is the first to address bias in generative models using disentanglement with human feedback, enhancing fairness and interpretability.
Findings
Effective bias removal with limited human feedback
Strong disentanglement results compared to recent methods
Generative models exhibit Clever-Hans-like bias behaviors
Abstract
A key assumption of most statistical machine learning methods is that they have access to independent samples from the distribution of data they encounter at test time. As such, these methods often perform poorly in the face of biased data, which breaks this assumption. In particular, machine learning models have been shown to exhibit Clever-Hans-like behaviour, meaning that spurious correlations in the training set are inadvertently learnt. A number of works have been proposed to revise deep classifiers to learn the right correlations. However, generative models have been overlooked so far. We observe that generative models are also prone to Clever-Hans-like behaviour. To counteract this issue, we propose to debias generative models by disentangling their internal representations, which is achieved via human feedback. Our experiments show that this is effective at removing bias even…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Neural Networks and Applications · Machine Learning and Data Classification
