MeDi: Metadata-Guided Diffusion Models for Mitigating Biases in Tumor Classification
David Jacob Drexlin, Jonas Dippel, Julius Hense, Niklas Preni{\ss}l, Gr\'egoire Montavon, Frederick Klauschen, Klaus-Robert M\"uller

TL;DR
This paper introduces MeDi, a diffusion model that uses metadata to generate synthetic histopathology images, aiming to reduce biases and improve classifier performance across diverse subpopulations in tumor classification.
Contribution
The paper presents a novel metadata-guided diffusion framework (MeDi) for augmenting underrepresented subpopulations with synthetic data to mitigate biases in histological tumor classification.
Findings
MeDi generates high-quality images for unseen subpopulations.
Synthetic data from MeDi improves classifier performance on biased datasets.
MeDi enhances the robustness of models to subpopulation shifts.
Abstract
Deep learning models have made significant advances in histological prediction tasks in recent years. However, for adaptation in clinical practice, their lack of robustness to varying conditions such as staining, scanner, hospital, and demographics is still a limiting factor: if trained on overrepresented subpopulations, models regularly struggle with less frequent patterns, leading to shortcut learning and biased predictions. Large-scale foundation models have not fully eliminated this issue. Therefore, we propose a novel approach explicitly modeling such metadata into a Metadata-guided generative Diffusion model framework (MeDi). MeDi allows for a targeted augmentation of underrepresented subpopulations with synthetic data, which balances limited training data and mitigates biases in downstream models. We experimentally show that MeDi generates high-quality histopathology images for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI in cancer detection · Generative Adversarial Networks and Image Synthesis · Digital Imaging for Blood Diseases
MethodsDiffusion
