Structural Similarity: When to Use Deep Generative Models on Imbalanced Image Dataset Augmentation
Chenqi Guo, Fabian Benitez-Quiroz, Qianli Feng, Aleix Martinez

TL;DR
This paper investigates when deep generative models are effective for augmenting imbalanced image datasets, introducing a similarity metric to predict augmentation success and analyzing the relationship between dataset similarity and accuracy gains.
Contribution
The paper proposes a new similarity metric, SSIM-supSubCls, to determine the effectiveness of generative augmentation on imbalanced datasets and analyzes its correlation with accuracy improvements.
Findings
Accuracy improvement decays exponentially with SSIM-supSubCls.
The proposed metric effectively predicts augmentation success.
Deep generative models are most beneficial when class similarity is low.
Abstract
Improving the performance on an imbalanced training set is one of the main challenges in nowadays Machine Learning. One way to augment and thus re-balance the image dataset is through existing deep generative models, like class-conditional Generative Adversarial Networks (cGAN) or Diffusion Models by synthesizing images on each of the tail-class. Our experiments on imbalanced image dataset classification show that, the validation accuracy improvement with such re-balancing method is related to the image similarity between different classes. Thus, to quantify this image dataset class similarity, we propose a measurement called Super-Sub Class Structural Similarity (SSIM-supSubCls) based on Structural Similarity (SSIM). A deep generative model data augmentation classification (GM-augCls) pipeline is also provided to verify this metric correlates with the accuracy enhancement. We further…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Digital Media Forensic Detection · AI in cancer detection
MethodsDiffusion
