Flow Matching with Optimized Subclass Priors for Medical Image Augmentation
Felix N\"utzel, Mischa Dombrowski, Bernhard Kainz

TL;DR
This paper introduces a novel offline data augmentation method for medical imaging that uses subclass priors to improve the generation of rare disease images, enhancing classifier performance.
Contribution
It proposes a two-level prior strategy with Gaussian mixture modeling and subclass-conditioned source distributions to improve rare class augmentation.
Findings
Improves tail-class generation fidelity and diversity (FID, IRS)
Enhances downstream balanced accuracy and macro-F1 across modalities
Consistently outperforms non-augmented baselines on benchmarks
Abstract
Rare diseases dominate the diagnostic challenge in medical imaging yet are severely underrepresented in clinical datasets, causing classifiers to fail on exactly the conditions where reliable detection matters most. Generative augmentation can supply the missing tail-class coverage, but coarse disease labels aggregate diverse subtypes and acquisition settings into multi-modal conditionals that bias generators toward dominant submodes, while a shared Gaussian source forces rare subpopulations through disproportionately long transport paths. We propose an offline strategy that introduces informative priors at two levels: first, we partition each coarse label into coherent submodes via Gaussian mixture modeling in the generative model's latent space; second, we learn subclass-conditioned source distributions that re-center and re-scale the starting distribution per submode, shortening…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
