Domain-Specific Latent Representations Improve the Fidelity of Diffusion-Based Medical Image Super-Resolution
Sebastian Cajas, Ashaba Judith, Rahul Gorijavolu, Sahil Kapadia, Hillary Clinton Kasimbazi, Leo Kinyera, Emmanuel Paul Kwesiga, Sri Sri Jaithra Varma Manthena, Luis Filipe Nakayama, Ninsiima Doreen, Leo Anthony Celi

TL;DR
Using domain-specific autoencoders significantly enhances the quality of diffusion-based medical image super-resolution, with improvements in PSNR and stable hallucination rates, emphasizing the importance of autoencoder choice.
Contribution
Demonstrates that replacing generic VAEs with domain-specific autoencoders improves super-resolution performance and provides a practical criterion for autoencoder selection.
Findings
Replacing the VAE yields +2.91 to +3.29 dB PSNR improvement.
Wavelet decomposition localizes the advantage to fine spatial frequency bands.
Autoencoder quality predicts downstream super-resolution performance with R^2 = 0.67.
Abstract
Latent diffusion models for medical image super-resolution universally inherit variational autoencoders designed for natural photographs. We show that this default choice, not the diffusion architecture, is the dominant constraint on reconstruction quality. In a controlled experiment holding all other pipeline components fixed, replacing the generic Stable Diffusion VAE with MedVAE, a domain-specific autoencoder pretrained on more than 1.6 million medical images, yields +2.91 to +3.29 dB PSNR improvement across knee MRI, brain MRI, and chest X-ray (n = 1,820; Cohen's d = 1.37 to 1.86, all p < 10^{-20}, Wilcoxon signed-rank). Wavelet decomposition localises the advantage to the finest spatial frequency bands encoding anatomically relevant fine structure. Ablations across inference schedules, prediction targets, and generative architectures confirm the gap is stable within plus or minus…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
