TL;DR
MeDUET introduces a novel framework that unifies 3D medical image analysis and synthesis through disentangled pretraining in the variational autoencoder latent space, leveraging multi-source data heterogeneity.
Contribution
It presents a new disentanglement approach that effectively separates anatomy and appearance, enabling improved synthesis, analysis, and domain generalization in 3D medical imaging.
Findings
Higher fidelity and faster convergence in synthesis.
Better domain generalization and label efficiency.
Effective factor separation and controllability.
Abstract
Self-supervised learning (SSL) and diffusion models have advanced representation learning and image synthesis, but in 3D medical imaging they are still largely used separately for analysis and synthesis, respectively. Unifying them is appealing but difficult, because multi-source data exhibit pronounced style shifts while downstream tasks rely primarily on anatomy, causing anatomical content and acquisition style to become entangled. In this paper, we propose MeDUET, a 3D Medical image Disentangled UnifiEd PreTraining framework in the variational autoencoder latent space. Our central idea is to treat unified pretraining under heterogeneous multi-center data as a factor identifiability problem, where content should consistently capture anatomy and style should consistently capture appearance. MeDUET addresses this problem through three components. Token demixing provides controllable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
