Improving text-conditioned latent diffusion for cancer pathology
Aakash Madhav Rao, Debayan Gupta

TL;DR
This paper enhances latent diffusion models for cancer pathology image synthesis by addressing current limitations, achieving better image quality and reduced computational costs compared to previous methods.
Contribution
It identifies pitfalls in existing models, corrects critical errors, and proposes improvements that outperform state-of-the-art in FID score and GPU memory efficiency.
Findings
Achieved an FID score of 21.11, surpassing previous models by 1.2.
Reduced train-time GPU memory usage by 7%.
Improved realism and efficiency in histopathology image generation.
Abstract
The development of generative models in the past decade has allowed for hyperrealistic data synthesis. While potentially beneficial, this synthetic data generation process has been relatively underexplored in cancer histopathology. One algorithm for synthesising a realistic image is diffusion; it iteratively converts an image to noise and learns the recovery process from this noise [Wang and Vastola, 2023]. While effective, it is highly computationally expensive for high-resolution images, rendering it infeasible for histopathology. The development of Variational Autoencoders (VAEs) has allowed us to learn the representation of complex high-resolution images in a latent space. A vital by-product of this is the ability to compress high-resolution images to space and recover them lossless. The marriage of diffusion and VAEs allows us to carry out diffusion in the latent space of an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies
MethodsDiffusion
