MedM2G: Unifying Medical Multi-Modal Generation via Cross-Guided Diffusion with Visual Invariant
Chenlu Zhan, Yu Lin, Gaoang Wang, Hongwei Wang, Jian Wu

TL;DR
MedM2G is a unified medical multi-modal generative framework that aligns, extracts, and generates diverse medical data types, improving multi-modal medical diagnosis and surpassing existing models across multiple tasks and datasets.
Contribution
It introduces the first unified model for medical multi-modal generation, leveraging visual invariants and cross-guided diffusion to enhance medical data synthesis.
Findings
Outperforms state-of-the-art models on 5 tasks across 10 datasets.
Effectively unifies text-image, image-text, and multi-modal medical generation.
Preserves medical visual invariants to improve generation quality.
Abstract
Medical generative models, acknowledged for their high-quality sample generation ability, have accelerated the fast growth of medical applications. However, recent works concentrate on separate medical generation models for distinct medical tasks and are restricted to inadequate medical multi-modal knowledge, constraining medical comprehensive diagnosis. In this paper, we propose MedM2G, a Medical Multi-Modal Generative framework, with the key innovation to align, extract, and generate medical multi-modal within a unified model. Extending beyond single or two medical modalities, we efficiently align medical multi-modal through the central alignment approach in the unified space. Significantly, our framework extracts valuable clinical knowledge by preserving the medical visual invariant of each imaging modal, thereby enhancing specific medical information for multi-modal generation. By…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI in cancer detection · Image Retrieval and Classification Techniques · Biomedical Text Mining and Ontologies
MethodsDiffusion · ALIGN
