Improving Performance, Robustness, and Fairness of Radiographic AI Models with Finely-Controllable Synthetic Data
Stefania L. Moroianu, Christian Bluethgen, Pierre Chambon, Mehdi Cherti, Jean-Benoit Delbrouck, Magdalini Paschali, Brandon Price, Judy Gichoya, Jenia Jitsev, Curtis P. Langlotz, Akshay S. Chaudhari

TL;DR
This paper introduces RoentGen-v2, a controllable synthetic data generator for chest radiographs that improves deep learning model performance, robustness, and fairness across diverse patient demographics by using a novel training strategy.
Contribution
We develop RoentGen-v2, the first demographic-conditioned diffusion model for clinically plausible chest radiograph synthesis, and demonstrate its effectiveness in enhancing model performance and fairness.
Findings
Synthetic pretraining improves classification accuracy by 6.5%.
Fairness gap reduces by 19.3% with synthetic pretraining.
Synthetic data enhances model generalization across institutions.
Abstract
Achieving robust performance and fairness across diverse patient populations remains a challenge in developing clinically deployable deep learning models for diagnostic imaging. Synthetic data generation has emerged as a promising strategy to address limitations in dataset scale and diversity. We introduce RoentGen-v2, a text-to-image diffusion model for chest radiographs that enables fine-grained control over both radiographic findings and patient demographic attributes, including sex, age, and race/ethnicity. RoentGen-v2 is the first model to generate clinically plausible images with demographic conditioning, facilitating the creation of a large, demographically balanced synthetic dataset comprising over 565,000 images. We use this large synthetic dataset to evaluate optimal training pipelines for downstream disease classification models. In contrast to prior work that combines real…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
