EMIXER: End-to-end Multimodal X-ray Generation via Self-supervision
Siddharth Biswal, Peiye Zhuang, Ayis Pyrros, Nasir Siddiqui, Sanmi, Koyejo, Jimeng Sun

TL;DR
EMIXER is an end-to-end multimodal generative model that synthesizes X-ray images and reports conditioned on diagnosis labels, leveraging self-supervision to improve clinical data augmentation and machine learning tasks.
Contribution
This paper introduces EMIXER, a novel multimodal generative adversarial network that jointly synthesizes X-ray images and reports with self-supervision, enhancing clinical data augmentation.
Findings
Synthetic data improves COVID-19 X-ray classification by 5.94%
Generated images and reports are validated by radiologists
Data augmentation boosts report generation and classification performance
Abstract
Deep generative models have enabled the automated synthesis of high-quality data for diverse applications. However, the most effective generative models are specialized to data from a single domain (e.g., images or text). Real-world applications such as healthcare require multi-modal data from multiple domains (e.g., both images and corresponding text), which are difficult to acquire due to limited availability and privacy concerns and are much harder to synthesize. To tackle this joint synthesis challenge, we propose an End-to-end MultImodal X-ray genERative model (EMIXER) for jointly synthesizing x-ray images and corresponding free-text reports, all conditional on diagnosis labels. EMIXER is an conditional generative adversarial model by 1) generating an image based on a label, 2) encoding the image to a hidden embedding, 3) producing the corresponding text via a hierarchical decoder…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Topic Modeling · Multimodal Machine Learning Applications
