Leveraging Diffusion Models for Synthetic Data Augmentation in Protein Subcellular Localization Classification
Sylvey Lin, Zhi-Yi Cao

TL;DR
This study explores using diffusion models to generate synthetic images for improving protein subcellular localization classification, revealing benefits and challenges of synthetic data augmentation in biomedical imaging.
Contribution
We introduce a simplified class-conditional diffusion model and hybrid training strategies for synthetic data augmentation in protein localization classification.
Findings
Synthetic data improved validation performance but not test generalization.
Baseline ResNet classifiers outperformed synthetic augmentation methods in stability.
Realistic data generation is crucial for effective augmentation in biomedical tasks.
Abstract
We investigate whether synthetic images generated by diffusion models can enhance multi-label classification of protein subcellular localization. Specifically, we implement a simplified class-conditional denoising diffusion probabilistic model (DDPM) to produce label-consistent samples and explore their integration with real data via two hybrid training strategies: Mix Loss and Mix Representation. While these approaches yield promising validation performance, our proposed MixModel exhibits poor generalization to unseen test data, underscoring the challenges of leveraging synthetic data effectively. In contrast, baseline classifiers built on ResNet backbones with conventional loss functions demonstrate greater stability and test-time performance. Our findings highlight the importance of realistic data generation and robust supervision when incorporating generative augmentation into…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCell Image Analysis Techniques · Machine Learning in Bioinformatics · Medical Image Segmentation Techniques
MethodsAverage Pooling · Convolution · Diffusion · Kaiming Initialization · Global Average Pooling · Max Pooling
