Synthetic Augmentation with Large-scale Unconditional Pre-training
Jiarong Ye, Haomiao Ni, Peng Jin, Sharon X. Huang, Yuan Xue

TL;DR
HistoDiffusion is a pre-trained latent diffusion model that generates realistic, category-specific images from unlabeled data, significantly improving medical image classification accuracy with minimal labeled data.
Contribution
The paper introduces HistoDiffusion, a novel synthetic augmentation method pre-trained on unlabeled data and fine-tuned with classifier guidance, reducing dependence on labeled datasets.
Findings
Pre-training on unlabeled datasets enhances image synthesis quality.
Synthetic augmentation improves classification accuracy by 6.4%.
Method effectively generalizes to unseen histopathology datasets.
Abstract
Deep learning based medical image recognition systems often require a substantial amount of training data with expert annotations, which can be expensive and time-consuming to obtain. Recently, synthetic augmentation techniques have been proposed to mitigate the issue by generating realistic images conditioned on class labels. However, the effectiveness of these methods heavily depends on the representation capability of the trained generative model, which cannot be guaranteed without sufficient labeled training data. To further reduce the dependency on annotated data, we propose a synthetic augmentation method called HistoDiffusion, which can be pre-trained on large-scale unlabeled datasets and later applied to a small-scale labeled dataset for augmented training. In particular, we train a latent diffusion model (LDM) on diverse unlabeled datasets to learn common features and generate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI in cancer detection · Generative Adversarial Networks and Image Synthesis · Radiomics and Machine Learning in Medical Imaging
MethodsLatent Diffusion Model · Diffusion
