Leveraging Contaminated Datasets to Learn Clean-Data Distribution with Purified Generative Adversarial Networks
Bowen Tian, Qinliang Su, Jianxing Yu

TL;DR
This paper introduces PuriGAN, a novel GAN framework designed to learn clean data distributions from contaminated datasets by incorporating an extra contamination dataset, with proven convergence and improved performance in image generation and downstream tasks.
Contribution
The paper proposes PuriGAN, a new GAN variant that effectively learns from contaminated datasets by distinguishing target from contaminated instances, with theoretical guarantees and practical improvements.
Findings
PuriGAN converges to the desired data distribution under mild conditions.
PuriGAN outperforms baselines in generating clean images from contaminated data.
PuriGAN achieves superior results in semi-supervised anomaly detection and PU-learning tasks.
Abstract
Generative adversarial networks (GANs) are known for their strong abilities on capturing the underlying distribution of training instances. Since the seminal work of GAN, many variants of GAN have been proposed. However, existing GANs are almost established on the assumption that the training dataset is clean. But in many real-world applications, this may not hold, that is, the training dataset may be contaminated by a proportion of undesired instances. When training on such datasets, existing GANs will learn a mixture distribution of desired and contaminated instances, rather than the desired distribution of desired data only (target distribution). To learn the target distribution from contaminated datasets, two purified generative adversarial networks (PuriGAN) are developed, in which the discriminators are augmented with the capability to distinguish between target and contaminated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Generative Adversarial Networks and Image Synthesis · Digital Media Forensic Detection
