Text2CT: Towards 3D CT Volume Generation from Free-text Descriptions   Using Diffusion Model

Pengfei Guo; Can Zhao; Dong Yang; Yufan He; Vishwesh Nath; Ziyue Xu,; Pedro R. A. S. Bassi; Zongwei Zhou; Benjamin D. Simon; Stephanie Anne Harmon,; Baris Turkbey; Daguang Xu

arXiv:2505.04522·eess.IV·May 9, 2025

Text2CT: Towards 3D CT Volume Generation from Free-text Descriptions Using Diffusion Model

Pengfei Guo, Can Zhao, Dong Yang, Yufan He, Vishwesh Nath, Ziyue Xu,, Pedro R. A. S. Bassi, Zongwei Zhou, Benjamin D. Simon, Stephanie Anne Harmon,, Baris Turkbey, Daguang Xu

PDF

Open Access

TL;DR

Text2CT introduces a diffusion model-based method that generates high-quality 3D CT volumes from free-text descriptions, enabling more flexible and accurate medical image synthesis for diagnostics and research.

Contribution

The paper presents a novel diffusion model framework that encodes medical text into latent space and decodes it into 3D CT scans, handling diverse free-text inputs unlike previous fixed-format methods.

Findings

01

Achieves state-of-the-art accuracy in anatomical fidelity

02

Effectively captures intricate structures from text descriptions

03

Demonstrates potential for diagnostics and data augmentation

Abstract

Generating 3D CT volumes from descriptive free-text inputs presents a transformative opportunity in diagnostics and research. In this paper, we introduce Text2CT, a novel approach for synthesizing 3D CT volumes from textual descriptions using the diffusion model. Unlike previous methods that rely on fixed-format text input, Text2CT employs a novel prompt formulation that enables generation from diverse, free-text descriptions. The proposed framework encodes medical text into latent representations and decodes them into high-resolution 3D CT scans, effectively bridging the gap between semantic text inputs and detailed volumetric representations in a unified 3D framework. Our method demonstrates superior performance in preserving anatomical fidelity and capturing intricate structures as described in the input text. Extensive evaluations show that our approach achieves state-of-the-art…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · 3D Shape Modeling and Analysis

MethodsDiffusion