A Multimodal Clinically Informed Coarse-to-Fine Framework for Longitudinal CT Registration in Proton Therapy

Caiwen Jiang; Yuzhen Ding; Mi Jia; Samir H. Patel; Terence T. Sio; Jonathan B. Ashman; Lisa A. McGee; Jean-Claude M. Rwigema; William G. Rule; Sameer R. Keole; Sujay A. Vora; William W. Wong; Nathan Y. Yu; Michele Y. Halyard; Steven E. Schild; Dinggang Shen; Wei Liu

arXiv:2604.13397·cs.CV·April 16, 2026

A Multimodal Clinically Informed Coarse-to-Fine Framework for Longitudinal CT Registration in Proton Therapy

Caiwen Jiang, Yuzhen Ding, Mi Jia, Samir H. Patel, Terence T. Sio, Jonathan B. Ashman, Lisa A. McGee, Jean-Claude M. Rwigema, William G. Rule, Sameer R. Keole, Sujay A. Vora, William W. Wong, Nathan Y. Yu, Michele Y. Halyard, Steven E. Schild, Dinggang Shen, Wei Liu

PDF

TL;DR

This paper introduces a clinically informed, multimodal, coarse-to-fine deep learning framework for deformable image registration in longitudinal CT scans, enhancing accuracy and speed for proton therapy workflows.

Contribution

It presents a novel hierarchical model integrating multimodal clinical priors with attention mechanisms for improved, fast, and clinically relevant CT registration.

Findings

01

Achieved superior registration accuracy over existing methods.

02

Enabled faster, more robust registration suitable for clinical workflows.

03

Validated on a large dataset of 1,222 CT scan pairs across multiple scenarios.

Abstract

Proton therapy offers superior organ-at-risk sparing but is highly sensitive to anatomical changes, making accurate deformable image registration (DIR) across longitudinal CT scans essential. Conventional DIR methods are often too slow for emerging online adaptive workflows, while existing deep learning-based approaches are primarily designed for generic benchmarks and underutilize clinically relevant information beyond images. To address this gap, we propose a clinically scalable coarse-to-fine deformable registration framework that integrates multimodal information from the proton radiotherapy workflow to accommodate diverse clinical scenarios. The model employs dual CNN-based encoders for hierarchical feature extraction and a transformer-based decoder to progressively refine deformation fields. Beyond CT intensities, clinically critical priors, including target and organ-at-risk…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.