Stochastic Siamese MAE Pretraining for Longitudinal Medical Images

Taha Emre; Arunava Chakravarty; Thomas Pinetz; Dmitrii Lachinov; Martin J. Menten; Hendrik Scholl; Sobha Sivaprasad; Daniel Rueckert; Andrew Lotery; Stefan Sacu; Ursula Schmidt-Erfurth; Hrvoje Bogunovi\'c

arXiv:2512.23441·cs.LG·December 30, 2025

Stochastic Siamese MAE Pretraining for Longitudinal Medical Images

Taha Emre, Arunava Chakravarty, Thomas Pinetz, Dmitrii Lachinov, Martin J. Menten, Hendrik Scholl, Sobha Sivaprasad, Daniel Rueckert, Andrew Lotery, Stefan Sacu, Ursula Schmidt-Erfurth, Hrvoje Bogunovi\'c

PDF

Open Access

TL;DR

STAMP introduces a stochastic Siamese MAE framework that effectively captures temporal uncertainty in longitudinal medical images, improving disease progression prediction over existing methods.

Contribution

It presents a novel stochastic approach to temporal encoding in MAE, reframing the loss as a conditional variational inference for better disease evolution modeling.

Findings

01

Outperforms existing temporal MAE methods on OCT and MRI datasets.

02

Pretrained ViT models achieve higher accuracy in disease progression prediction.

03

Effectively models non-deterministic disease dynamics.

Abstract

Temporally aware image representations are crucial for capturing disease progression in 3D volumes of longitudinal medical datasets. However, recent state-of-the-art self-supervised learning approaches like Masked Autoencoding (MAE), despite their strong representation learning capabilities, lack temporal awareness. In this paper, we propose STAMP (Stochastic Temporal Autoencoder with Masked Pretraining), a Siamese MAE framework that encodes temporal information through a stochastic process by conditioning on the time difference between the 2 input volumes. Unlike deterministic Siamese approaches, which compare scans from different time points but fail to account for the inherent uncertainty in disease evolution, STAMP learns temporal dynamics stochastically by reframing the MAE reconstruction loss as a conditional variational inference objective. We evaluated STAMP on two OCT and one…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Retinal Imaging and Analysis