# Multi-Patient Vision Transformer for Markerless Tumor Motion Forecasting

**Authors:** Gauthier Rotsart de Hertaing, Dani Manjah, Benoît Macq

PMC · DOI: 10.3390/biomedicines14030496 · Biomedicines · 2026-02-25

## TL;DR

This paper introduces a vision transformer model for predicting lung tumor motion during radiotherapy without using physical markers.

## Contribution

The novel approach uses a multi-patient vision transformer with fine-tuning to enable accurate, markerless tumor motion forecasting.

## Key findings

- Low-resolution inputs with larger patch sizes improve forecasting accuracy by reducing image noise.
- Fine-tuning a multi-patient model with limited patient-specific data achieves comparable or better accuracy than patient-specific models.
- The method enables efficient and accurate short-term tumor motion prediction under clinical constraints.

## Abstract

Background: Accurate forecasting of lung tumor motion is crucial for precise radiotherapy. Deep-learning-based markerless tracking methods have been explored, but extending these approaches to predict future tumor trajectories remains largely unaddressed. We address this by framing markerless lung tumor motion forecasting as a spatio-temporal prediction task using a vision transformer to estimate three-dimensional tumor positions over short horizons. Methods: Digitally reconstructed radiographs (DRRs) generated from four-dimensional computed tomography scans of 12 lung cancer patients were used to train a multi-patient (MP) model. Patient-specific (PS) models trained solely on planning data were compared, and the MP model was further fine-tuned using a small number of patient-specific treatment images under realistic clinical constraints. Models processed sequences of 12 DRRs, with performance evaluated via root mean square error. Results: The results indicate that low-resolution inputs with larger patch sizes outperform higher-resolution configurations by reducing image noise. PS models require extensive data to match MP performance, whereas fine-tuning the MP model with limited patient-specific data achieves comparable or superior forecasting accuracy at a lower cost. Conclusions: These findings demonstrate that Vision Transformers can extend markerless tracking methods to accurate short-term forecasting and highlight fine-tuning as an efficient strategy for personalized prediction.

## Linked entities

- **Diseases:** lung cancer (MONDO:0005138)

## Full-text entities

- **Diseases:** lung cancer (MESH:D008175), Tumor (MESH:D009369)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13023511/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13023511/full.md

## References

54 references — full list in the complete paper: https://tomesphere.com/paper/PMC13023511/full.md

---
Source: https://tomesphere.com/paper/PMC13023511