TL;DR
This paper introduces DiffKT3D, a novel 3D diffusion model leveraging pretrained video diffusion knowledge and clinical data conditioning, achieving state-of-the-art dose prediction in radiotherapy planning.
Contribution
The work presents a unified diffusion framework with modality-specific conditioning and RL post-training, improving generalization and clinical relevance in dose prediction.
Findings
DiffKT3D reduces voxel-level MAE from 2.07 to 1.93.
Achieves superior image quality and preference match.
Sets new state-of-the-art in dose prediction.
Abstract
Voxel-wise dose prediction is a critical yet challenging task in practical radiotherapy (RT) planning, as bespoke models trained from scratch often struggle to generalize across diverse clinical settings. Meanwhile, generative models trained on billion-scale datasets from vision domains have achieved impressive performance. Herein, we propose DiffKT3D, a unified Any2Any 3D diffusion framework that leverages prior knowledge from pretrained video diffusion models for efficient and clinically meaningful dose prediction. To enable flexible conditioning across multiple clinical modalities (CT, anatomical structures, body, beam settings, etc.), we introduce an Any2Any conditional paradigm utilizing modality-specific embeddings without cross-attention overhead. Further, we design a novel reinforcement learning (RL) post-training mechanism guided by a clinically-informed Scorecard explicitly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
