DiffAug: Enhance Unsupervised Contrastive Learning with   Domain-Knowledge-Free Diffusion-based Data Augmentation

Zelin Zang; Hao Luo; Kai Wang; Panpan Zhang; Fan Wang; Stan.Z Li; Yang; You

arXiv:2309.07909·cs.LG·May 28, 2024

DiffAug: Enhance Unsupervised Contrastive Learning with Domain-Knowledge-Free Diffusion-based Data Augmentation

Zelin Zang, Hao Luo, Kai Wang, Panpan Zhang, Fan Wang, Stan.Z Li, Yang, You

PDF

Open Access 1 Repo

TL;DR

DiffAug introduces a diffusion-based data augmentation method for unsupervised contrastive learning, generating positive samples without domain knowledge or large external datasets, improving representation across diverse data types.

Contribution

The paper presents DiffAug, a novel diffusion model-based augmentation technique that enhances unsupervised contrastive learning without domain-specific data or supervision.

Findings

01

Outperforms existing augmentation methods on multiple datasets

02

Improves representation quality in unsupervised contrastive learning

03

Works effectively across DNA, visual, and bio-feature data

Abstract

Unsupervised Contrastive learning has gained prominence in fields such as vision, and biology, leveraging predefined positive/negative samples for representation learning. Data augmentation, categorized into hand-designed and model-based methods, has been identified as a crucial component for enhancing contrastive learning. However, hand-designed methods require human expertise in domain-specific data while sometimes distorting the meaning of the data. In contrast, generative model-based approaches usually require supervised or large-scale external data, which has become a bottleneck constraining model training in many domains. To address the problems presented above, this paper proposes DiffAug, a novel unsupervised contrastive learning technique with diffusion mode-based positive data generation. DiffAug consists of a semantic encoder and a conditional diffusion model; the conditional…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zangzelin/code_diffaug
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Cancer-related molecular mechanisms research · AI in cancer detection

MethodsContrastive Learning · Diffusion