Leveraging Contaminated Datasets to Learn Clean-Data Distribution with   Purified Generative Adversarial Networks

Bowen Tian; Qinliang Su; Jianxing Yu

arXiv:2302.01722·cs.LG·February 6, 2023

Leveraging Contaminated Datasets to Learn Clean-Data Distribution with Purified Generative Adversarial Networks

Bowen Tian, Qinliang Su, Jianxing Yu

PDF

Open Access 2 Repos 1 Video

TL;DR

This paper introduces PuriGAN, a novel GAN framework designed to learn clean data distributions from contaminated datasets by incorporating an extra contamination dataset, with proven convergence and improved performance in image generation and downstream tasks.

Contribution

The paper proposes PuriGAN, a new GAN variant that effectively learns from contaminated datasets by distinguishing target from contaminated instances, with theoretical guarantees and practical improvements.

Findings

01

PuriGAN converges to the desired data distribution under mild conditions.

02

PuriGAN outperforms baselines in generating clean images from contaminated data.

03

PuriGAN achieves superior results in semi-supervised anomaly detection and PU-learning tasks.

Abstract

Generative adversarial networks (GANs) are known for their strong abilities on capturing the underlying distribution of training instances. Since the seminal work of GAN, many variants of GAN have been proposed. However, existing GANs are almost established on the assumption that the training dataset is clean. But in many real-world applications, this may not hold, that is, the training dataset may be contaminated by a proportion of undesired instances. When training on such datasets, existing GANs will learn a mixture distribution of desired and contaminated instances, rather than the desired distribution of desired data only (target distribution). To learn the target distribution from contaminated datasets, two purified generative adversarial networks (PuriGAN) are developed, in which the discriminators are augmented with the capability to distinguish between target and contaminated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Leveraging Contaminated Datasets to Learn Clean-Data Distribution with Purified Generative Adversarial Networks· underline

Taxonomy

TopicsAnomaly Detection Techniques and Applications · Generative Adversarial Networks and Image Synthesis · Digital Media Forensic Detection